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(57) Abstract 

The present invention provides crystallized pro- 
tective protein/cathepsin A (PPCA), a precursor thereof 
(pPPCA) or at least one subdomain thereof; methods 
for x-ray diffraction analysis to provide x-ray diffrac- 
tion patterns of sufficiently high resolution for three- 
dimensional structure determination of the protein, as 
well as methods for rational drug design (RDD), based 
on using amino acid sequence data and/or x-ray crys- 
tallography data provided on computer readable media, 
as analyzed on a computer system having suitable com- 
puter algorithms. 
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Protective Protein/Cathepsin A and Precursor: Crystallization, X-Ray Diffraction, Three- 
Dimensional Structure Determination and Rational Drug Design 



Background of the Invention 

5 Statement as to Rights to Inventions Made Under 
Federally-Sponsored Research and Development 

Part of the work performed during development of this invention utilized U.S. Government funds. The U.S. 
Government has certain rights in this invention. 
Field of the Invention 

10 The present invention is in the fields of molecular biology, protein purification, protein crystallization, x-ray 

diffraction analysis, three-dimensional structure determination and rational drug design (RDD). The present invention 
provides crystallized protective protein/cathepsin A (PPCA) and its precursor (pPPCA). The crystallized PPCA or 
pPPCA is analyzed by x-ray diffraction techniques. The resulting x-ray diffraction patterns are of sufficiently high 
resolution to be useful for determining the three-dimensional structure of the PPCA or pPPCA protein, and for RDD. 

15 Related Background Art 

The human protective protein/cathepsin A (PPCA, also known as human protective protein or HPP) has been 
identified as the primary genetic defect underlying galactosialidosis (d'Azzo et al, Proc. Natl Acad Sci. U.S.A. 79:4535- 
4539 (1982)), a lysosomal storage disease inherited as an autosomal recessive trait. Patients with this disorder are 
diagnosed as having drastically reduced P-galactosidase and neuraminidase activities in their cell lysosomes. Examples 

20 of lysosomal storage diseases are presented in Table 3 16- 1 of Braunwald et ai. eds., Harrison 's Principles of internal 
Medicine, 1 1th Ed., pp. 1661-1671, McGraw Hill Book Co., New York (1987); as well as Wenger et at.. Biochem. 
Biophys. Res. Commun. 52:589-595 (1978); Tettamanti et al eds., Siaiidases and Sialidosis. Perspectives in Inherited 
Metabolic Diseases. Vol. 4. Edi. Ermes, Milano (1981), pp. 261-279 and 379-395; and van Diggelen et. ai Lancet 
2:804(1987), which references are entirely incorporated herein by reference.. 

25 Researchers have proposed that one of PPCA's functions is to stabilize p-galactosidase and neuraminidase in 

a multi-enzyme complex, which complex is deficient in galactosialidosis patients (d'Azzo et al (1982,1, infra; Hoogeveen 
et al (1983/ infra). Evidence for this protective function comes from studies showing that PPCA is taken up from the 
culture medium by galactosialidosis fibroblasts and that PPCA restores both p-galactosidase and neuraminidase activities 
to these fibroblasts (d'Azzo et at. (1982A infra). 

30 The cDNA for PPCA directs the synthesis of a 452 amino acid precursor PPCA (pPPCA) (Figure 13) with a 

molecular weight of 54 kDa (Galjart e/ ai. Cell. 54:755-764(1988)). The amino acid sequences of PPCA (Figure 14) 
and pPPCA (Figure 13) contain two glycosylation sites (Asn 117 and Asn 305), both of which are glycosylated in 
cultured fibroblasts and cells over-expressing PPCA or pPPCA. pPPCA dimerizes soon after synthesis in the 
endoplasmic reticulum (ER) (Zhou etai, EMBOJ. 70:404-4048 (1991)). 

35 Lysosomal PPCA has cathepsin A/deamidase/esterase activities which are exerted in vitro on a specific subset 

of bioactive peptides. Non-limiting examples of those hydrolyzed by PPCA are: substance P and substance P-free acid; 
oxytocin and oxytocin-free acid; neurokinin A; angiotensin I: bradykinin (Jackman infra. (1990). Furthermore, the 
enzyme inactivates endothelin I activity in rat smooth muscle cells and normal human tissues. This activity was deficient 
in liver from a galactosialidosis patient (Itoh, infra. 1995: Jackman etai, J. Biol Chem. 267.2872-2875, (1992). 

40 Endothelins (ET-I. ET-2 and ET-3) are potent vasoconstrictors and elevate blood pressure in mammals. Thev 

also intluence cell proliferation and hormone production and have been implicated in cardiovascular disorders, ranging 
from hypertension to stroke to ischemic heart disease (Rubanyi and Polokoff. Pharmc.Rev. 46:325-415 (1994)). 

The three-dimensional structure of a PPCA or a pPPCA has not previously been published, which structure 
could delineate specific biological activities and ligands as therapeutics for PPCA-related pathologies. Accordingly, 

45 there is a need to provide three-dimensional structures of at least one PPCA, pPPCA or ligands for diagnosis or therapy 
of PPCA-related pathologies. 
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Summary of the Invention 

The present invention provides methods of expressing, purifying and crystallizing a human protective 
protein/cathepsin A (PPCA) and its precursor, precursor protective protein/cathepsin A (pPPCA). The present invention 
also provides methods for obtaining crystallized PPCA or pPPCA that can be analyzed to obtain x-ray diffraction 
5 patterns of sufficiently high resolution to be useful for three-dimensional structure determination of the protein. 

The x-ray diffraction patterns can be either analyzed directly to provide the three dimensional structure (if of 
sufficiently by high resolution), or atomic coordinates for the crystallized PPCA or pPPCA, as provided herein, can be 
used for structure determination. The x-ray pattern/diffraction patterns obtained by methods of the present invention, 
and provided on computer readable media, are used to provide electron density maps. The amino acid sequence is also 
10 useful for three-dimensional structure determination. The data is then used in combination with phase determination 
(e.g., using multiple isomorphous replacement (MIR) molecular replacement techniques) to generate electron density 
maps of a PPCA or a pPPCA, using a suitable computer system. 

The electron density maps, provided by analysis of either the x-ray diffraction patterns or working backwards 
from the atomic coordinates, provided herein, are then fitted using suitable computer algorithms to generate secondary, 
15 tertiary and/or quaternary domains of a PPCA or a pPPCA, which domains are then used to provide an overall three- 
dimensional structure, as well as expected binding and active sites of the PPCA or pPPCA. pPPCA has some of the 
active and binding sites of PPCA , except for changes in structure due to the presence of the portion of the pPPCA which 
is deleted during maturation to PPCA (e.g.. residues 285-298 of Figure 13). 

Structure determination methods and computer systems are also provided by the present invention for rational 
20 drug design (RDD). These RDD methods use computer modeling programs to find potential ligands that are calculated 
to associate with, or bind to, sites or domains of a PPCA or a pPPCA. Potential ligands are then screened for modulating 
or binding activity. Such screening methods can be selected from assays for at least one PPCA-specific structural feature 
or biological activity, preferably as associated with a PPCA- or pPPCA-related pathology, e.g., protective activity (e.g., 
modulation of p-galactosidase activity and neuraminidase (NA) activity); and peptide or enzyme modulating activity 
25 (e.g., of endothelin I (serine carboxypeptidase), neuropeptides, cathepsin A, and the like), according to known assays. 
The resulting ligands provided by methods of the present invention are synthesized and are useful for treating, inhibiting 
or preventing at least one of PPCA related pathology in a mammal. 

Other objects of the invention will be apparent to one of ordinary skill in the art from the following detailed 
description and examples relating to the present invention. 
30 Brief Description of the Figures 

Figure 1: is a schematic ribbon diagram of the PPCA monomer (monomer 1), where Secondary structure 
assignments are according to DSSP (Kabsch and Sander, Biopolymers 22:2577-2637 (1983)). The 'core* domain is 
shown in yellow. The 4 cap* domain consists of a 'helical* subdomain, in red, and a 'maturation 1 subdomain, in orange. 
The catalytic triad Ser 150, His 429 and Asp 372 (from right to left) is shown by small green spheres. (Figure generated 
35 using MOLSCRIPT (Kraulis, 1 Appl Cryst. 24:946-950 (1991))). 

Figure 2 is stereo diagram is presented of the C\ trace of the PPCA monomer 1 with numbering of selected 
residues. The residues forming the ct-helices and P-strands are as follows according to DSSP: 

Core domain: Cpi (21-27); Cp2(32-39); Cp3(50-54); Cctl(63-67) Cp4(73-75); Cp5(82-84); CP6(94-98); 
Cct2( It 8- 1 35); Cp7( 144- 1 49); Ca3( 1 52- 1 63); Cp8( 171-1 77); Ca4(307-3 1 3); Ca5(3 1 6-32 1 ); Ca6(336-34 1 ); Ca7(350- 
40 359); Cp9(363-369): Ca8(377-386): cpi 0(391 -401); Cpl 1(407-416); CP 12(4 19-424): Ca9(43 1-434); Cal0(436-447); 

Cap domain: Hal(183-196); Ha2(202-212); Ha3(226-240): Mpl(26l-264); Mp2(267-270): Mal(290-293); 
Mp3(296-299). Note that for monomer 2 the secondary structure assignments in the cap domain are slightly different 
than in monomer I. Residues in Hpl are in a region of poor density and Mai is an extended coil. (Figure generated 
using MOLSCRIPT (Kraulis (1991). infra). 
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Figure 3 shows the density for the disulfide bridges Cys 212-Cys 228 and Cys 213-Cys 21 8 is presented as 
revealed in the SigmaA weighted 2mF 0 -DF c electron density map (Read, Acta Crystalhgr. A 42:140-149 (1986)) 
calculated from the model refined to 2.2 A; the map has been contoured at lo. (Figure drawn with the O computer 
program (Jones, Acta Crystalhgr A 47: 1 1 0- 11 9 ( 1 99 1 ))). 
5 Figure 4 is stereo diagram is presented of the superimposed C* traces from the two crystal lographically 

independent PPCA monomers forming the dimer. Monomer 1 is in blue, monomer 2 is in red. Residues referred to in 
the text are labeled. Residues 259 and 260 have not been incorporated in the model of monomer 2. since no electron 
density was observed for them. Note the tremendous difference in conformation of the excision peptide located in the 
upper right corner of the proteins. (Figure generated by MOLSCRIPT (Kraulis (1991), infra)). 

10 Figure 5 is a schematic ribbon diagram is presented of the PPCA dimer viewed approximately along the two- 

fold axis. For monomer l t the core domain is yellow while the cap domain consists of a helical subdomain in red and 
a maturation subdomain in orange. For monomer 2, the core domain is green, while the cap domain consists of a blue 
helical subdomain and a light blue maturation subdomain. (Figure generated using MOLSCRIPT (Kraulis (1991 ), infra)). 
Figure 6A-B is a representation of the molecular surface of the PPCA dimer. The surface was calculated with 

15 GRASP (Nicholls, A.,etal, Proteins //:28 1-296 (1991)) and colored according to the electrostatic potential. Dark blue 
corresponds to positive potential > + 1 5.0 kT/e and dark red to a negative <- 1 5,0 kT/e potential. Figure 6A: standard 
view, along the diad with the dimer oriented as in Figure 4. Figure 6B: side view of the dimer, ninety degrees rotated 
with respect to 6A. 

Figure 7A-F presents a topological comparison of 6 members of the hydrolase fold family. The arrangement 

20 of structural elements in the central core domain (in green and yellow) of the different proteins is generally similar. The 
cap domains (in red) vary greatly. The following structures are shown starting from the top left hand corner (references 
and PDB entry codes are given in between brackets): Figure 7A shows the PPCA precursor cap domain that consists of 
two subdomains one a-helical and the other mainly [*-sheet; Figure 7B shows CPW (3SC2, Liao et al (1992) infra), cap 
domain helical; Figure 7C shows CPY (LYSC, Endrizzi et al (1994), infra), cap domain helical; Figure 7D shows 

25 dehalogenase (2HAD, Franken etal, J EMBO 70:1297-1302 (1991)), cap domain helical but quite different from the 
serine carboxypeptidases; Figure 7E shows lipase from Pseudomonas glumae ( ITAH, Noble et al, FEBS Lett. 33) : 1 23- 
128 (1993)), cap domain mixed a-helical and (J-strands; and Figure 7F shows acetylcholine esterase (1 ACE, Sussman 
et al t Science 253: 872-879 (1991)), cap domain large and predominantly a-helical. The secondary structure 
assignments were generated with the computer program O, using structures provided and/or available from the 

30 Brookhaven Protein Data Bank. (This Figure was generated using MOLSCRIPT (Kraulis (1991), infra)). 

Figure 8A-B shows the superposition of the C" traces from the PPCA and CPW monomers, showing that the 
major differences between the two enzymes are localized in the cap domain. PPCA has a large 'maturation* subdomain 
and the 'helical subdomain* is rotated with respect to the CPW counterpart (Figure drawn with the O program (Jones 
( 1 99 1 ), infra)). Figure 8B shows the C traces from the PPCA and CPW dimers after the core domains from the subunits 

35 (shown on the right hand side of the two dimers) have been superimposed. Notice the remarkable difference in mutual 
orientation (of 1 5°) of the two subunits on the left hand side of the two dimers, which has been accentuated by an arrow. 
(Figure drawn with the O computer program (Jones (1991 ), supra)). 

Figure 9 is a stereo view of the Ca trace of PPCA monomer 1 highlighting regions involved in the maturation 
event. Color scheme for the trace is as follows: core domain in light blue, helical subdomain in red. maturation 

40 subdomain in orange with the exception of the excision peptide (residues 285-298) which is shown in blue. Orange 
sphere mark the residues 272 and 277 marking the beginning and end of the blocking peptide. The catalytic triad Ser 
150. His 429 and Asp 372 is shown as light blue spheres. Two cysteines Cys 253 and Cys 303 referred to in the 
discussion are colored green. (This Figure generated using MOLSCRIPT (Kraulis (1991). infra)). 

Figure 10 is a close-up representation of the 'blocking' peptide (residues 272-277) bound in the active site. 

45 rendering the catalytic triad solvent inaccessible. Residues from the maturation subdomain are shown in orange, residues 
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from the helical domain in magenta and residues from the core domain in cyan. The excision peptide is shown in blue. 
Side chains are shown for residues making extensive contacts with the blocking peptide or if mentioned in the text. The 
catalytic triad is shown in white. (Figure drawn with O (Jones (1991), infra)). 

Figure 1 1 is a representation of elements proposed to be involved in the activation mechanism of the precursor 
form of PPCA as discussed in the text The C-trace of the core domain is shown in cyan, the helical subdomain in red, 
the maturation subdomain in orange, and the excision peptide is shown in blue. Relevant side chains are depicted and 
labeled. Rearrangement of the residues 254-302 limited by the disulfide Cys 253 and Cys 303 would free up the active 
site cleft. A charge cluster Arg 262, Glu 264, Arg 298 and Asp 300 occupies a strategic position within the maturation 
subdomain, possibly involved in pH dependent regulation of conformational changes. The solvent accessible surface 
was calculated and visualized with the atomic coordinates by BIOGRAF (BIOGRAF Construct Users Guide Version 
3.2.1. , June 1993). 

Figure 12 is a schematic representation of the proposed activation of PPCA. The active site cleft is formed by 
the core domain (indicated as 'core' in the above scheme) and the helical subdomain (indicated as 'a')- The maturation 
subdomain (indicated as 'm') contains the residues that block the active site cleft rendering the precursor enzymatically 
inactive, shown in structure 1. In the acidic endosome/lysosome, the precursor undergoes activation. In activation 
pathway 2a, conformational rearrangements induced by low pH might render the excision peptide more accessible to 
proteases as a first step, followed by cleavage of the polypeptide chain removing the excision peptide. Alternatively, 
in pathway 2b, proteolytic cleavage of the excision peptide might form the trigger for the total rearrangement, removing 
the blocking peptide from the active site and thus generating the fully active enzyme as shown in structure 3. 

Figure 13 shows the amino acid sequence of a human pPPCA. The underlined portion (residues 285-298) 
shows an excision peptide for conversion to the mature form, PPCA. 

Figure 14 shows the amino acid sequence of a human PPCA. 

Figure 15 shows a sequence alignment between pPPCA, CPW and CPY (top three sequences shown). Identical 
residues among all three sequences are boxed. Residue numbering is included for the pPPCA amino acid sequence. 
The alignment was made using the GCG program PILEUP (GCG version 8), then manually adjusted using 3D-structural 
knowledge from the superposition of the CPW (Liao et al., 1992) and CPY (Endrizzi et al., 1 994) atomic coordinates. 
The alignment was later used to design a multi-Ala search probe for molecular replacement calculations shown in the 
fourth sequence shown as 'model'. The structure determination of pPPCA subsequently revealed that the protein can 
be divided in two domains: a 'core' domain (residues 1-182 and 303-452) and 'cap' domain (residues 183-302). The 
secondary structure elements for the PPCA precursor are depicted with shaded bars (for details on the assignment and 
nomenclature, see Rudenko et al. Structure 5:1249-1259 (1988) ). 

Figure 16 shows a schematic representation of a bootstrapping* cycle as described in Example 2. 

Figure 17 is a representation of an initial molecular mask enlarged to accommodate missing area's in the model. 
The program MAMA (Kleywegt & Jones, 1994) was used to calculate the mask and mask editing options in O (Jones 
et al., 1 99 1 ) were used to extend the mask. 

Figure 18 is a representation of an enlargement of the model during the bootstrapping procedure plotted as a 
function of the expansion step. The number of C atoms incorporated in the model per monomer is given (-«-) as 
well as the number of correct side chains (-»--). Note that after the first round of building in the molecular replacement 
map (expansion step to*). 37 residues from the molecular replacement search probes had to be deleted from the model 
reducing the number of C atoms to 294. Subsequent cycles allowed for the model to be expanded by small increments. 

Figure 19 is a representation of a comparison of the C trace from a monomer core model (shown in magenta) 
and the complete PPCA monomer (shown in yellow). The core model contained only 294 C atoms. The 452 residue 
PPCA monomer consists of a core domain and a cap domain. The helical subdomain and the maturation subdomain 
forming the cap domain have been shown in the figure above. 
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Figure 20A-D is a representation of the resolving power of the bootstrapping procedure showing three different 
stages in map quality. The atomic coordinates of the refined model are visualized with the electron density in Figures 
20B. 20C and 20D. Figures 20A and 2DB show the initial 2m|F obs |-D|F ctk | SigmaA weighted map calculated using 
phases from the molecular replacement solution. The electron density is essentially uninterpretable. Fig. 20C shows 
twofold averaged 2 IF^HF^I electron density map calculated using inverted phases from cycle bmc6. The density for 
P-strand M(J2 (residues 266-271) has become clearly visible. Fig. 20D shows unaveraged 2m|F ot>l |-D|F c , lc | SigmaA 
weighted map calculated using phases from the refined model The quality of the density is very good. Density for the 
helix Mai (residues 287-293) which assumes a different conformation in the two monomers is now also apparent. 

Figure 21 shows a Ramachandran plot calculated for one monomer from a refined model of a pPPCA. Both 
monomers in the asymmetric unit give essentially equivalent plots. 

Figure 22 shows a schematic of a computer system for PPCA or pPPCA structure determination and/or rational 
drug design. 

Figure 23.1-52 lists the atomic coordinates for the active site of a pPPCA dimer having the amino acid 
sequence presented as portions of at least one of 50-76, 144-155, 173-197, 226-253, 226-288, 294-310, 327-344, 338- 
350, 366-381 and 423-436 of (Figure 23.1-23.26) 452 amino acids (designated 1-452) of monomer 1, as well as 
corresponding portions of (Figure 23.26-23.52) 452 amino acids (designated 1001-1452) of monomer 2. 

Detailed Description of the Preferred Embodiments 

The present invention provides methods for expressing, purifying and crystallizing a protective 
protein/cathepsin A (PPCA) or a precursor protective protein/cathepsin A (pPPCA), where the crystals diffract x-rays 
with sufficiently high resolution to allow determination of the three-dimensional structure of the PPCA or pPPCA, or 
a portion or subdomain thereof. The three-dimensional structure (eg.. as provided on computer readable media of the 
present invention) is useful for rational drug design of ligands of a PPCA or a pPPCA. Such ligands can be synthesized 
or recombinant jy produced and are useful as diagnostic agents or drugs for diagnosing, treating, inhibiting or preventing 
at least one PPCA- or pPPCA-related pathology. 

The determined structure is made using the PPCA or pPPCA amino acid sequences and/or atomic coordinate/x- 
ray diffraction data, which are analyzed to provide atomic model output data corresponding to the three-dimensional 
structure, e.g.. as provided on computer readable media. The computer analysis of the atomic coordinate/x-ray 
diffraction data and/or the amino acid sequence allows the calculation of the secondary, tertiary and/or quaternary 
structures; domains; and/or subdomains of the protein. These domains are combined and refined by additional 
calculations using suitable computer subroutines to determine the most probable or actual three-dimensional structure 
of the PPCA or pPPCA, including potential or actual active sites, binding sites or other structural or functional domains 
or subdomains of the protein. 

Structure determination methods are also provided by the present invention for rational drug design (RDD) of 
PPCA or pPPCA ligands. Such drug design uses computer modeling programs that calculate difFerent molecules 
expected to interact with the determined active sites, binding sites, or other structural or functional domains or 
subdomains of a PPCA or a pPPCA. These ligands can then be produced and screened for activity in modulating or 
binding to a PPCA or pPPCA, according to methods and compositions of the present invention. 

The actual PPCA or pPPCA-ligand complexes can optionally be crystallized and analyzed using x-ray 
diffraction techniques. The diffraction patterns obtained are similarly used to calculate the three-dimensional interaction 
of the ligand and the PPCA or pPPCA, to confirm that the ligand binds to. or changes the conformation of. particular 
domain(s) orsubdomain(s) of the PPCA or pPPCA. Such screening methods are selected from assays for at least one 
biological activity of a PPCA or a pPPCA- The resulting ligands, provided by methods of the present invention, 
modulate or bind at least one PPCA or pPPCA and are useful for diagnosing, treating or preventing PPCA- or pPPCA- 
related pathologies in animals, such as humans. Ligands of a particular PPCA or pPPCA can similarly modulate other 
PPCAs or pPPCAs from other sources, such as other eukaryotes. 
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A PPCA or pPPCA is also provided as a crystallized protein suitable for x-ray diffraction analysis. The x-ray 
diffraction patterns obtained by the x-ray analysis are of moderate, to moderately high, to high resolution, eg, 30-10, 
10-3.5 or 1 .5-3.5 A. respectively, with the higher resolutions included. These diffraction patterns are suitable and useful 
for three-dimensional structure determination of a PPCA or a pPPCA. domain or subdomain thereof. 
5 The determination of the three-dimensional structure of a PPCA or pPPCA has a broad-based utility. 

Significant sequence identity and conservation of important structural elements are expected to exist among different 
PPCAs or pPPCAs. Therefore, the three-dimensional structure from one or few PPCAs or pPPCAs can be used to 
identify iigands that have diagnostic or therapeutic value for at least one PPCA- or pPPCA-related pathology that may 
involve PPCAs or pPPCAs having different amino acid sequences. 

1 0 Determination of Protein Structures 

Different techniques give different and complementary information about protein structure. The primary 
structure is obtained by biochemical methods, either by direct determination of the amino acid sequence from the 
protein, or from the nucleotide sequence of the corresponding gene or cDNA. The quaternary structure of large proteins 
or aggregates can also be determined by electron microscopy. To obtain the secondary and tertiary structure, which 

15 requires detailed information about the arrangement of atoms within a protein, x-ray crystallography is preferred. See] 
e.g., Blundell, infra: Oxender, infra; McPherson, infra; Wyckoff. infra. 

The first prerequisite for solving the three-dimensional structure of a protein by x-ray crystallography is a well- 
ordered crystal that will diffract x-rays strongly. The crystallographic method directs a beam of x-rays onto a regular, 
repeating array of many identical molecules so that the x-rays are diffracted from it in a pattern from which the structure 

20 of an individual molecule can be retrieved. Well-ordered crystals of globular protein molecules are large, spherical, or 
ellipsoidal objects with irregular surfaces, and crystals thereof contain large holes or channels that are formed between 
the individual molecules. These channels, which usually occupy more than half the volume of the crystal, are filled with 
disordered solvent molecules. The protein molecules are in contact with each other at only a few small regions. This 
is one reason why structures of proteins determined by x-ray crystallography are generally the same as those for the 

25 proteins in solution. 

The formation of crystals is dependent on a number of different parameters, including pH, temperature, protein 
concentration, the nature of the solvent and precipitant, as well as the presence of added ions or Iigands to the protein. 
Many routine crystallization experiments may be needed to screen all these parameters for the few combinations that 
might give crystal suitable for x-ray diffraction analysis. Crystallization robots can automate and speed up the work of 
30 reproducibly setting up large numbers of crystallization experiments. 

A pure and homogeneous protein sample is important for successful crystallization. Proteins obtained from 
cloned genes in efficient expression vectors can be purified quickly to homogeneity in large quantities in a few 
purification steps. A protein to be crystallized is preferably at least 93-99% pure according to standard criteria of 
homogeneity. Crystals form when molecules are precipitated very slowly from supersaturated solutions. The most 
35 frequently used procedure for making protein crystals is the hanging-drop method, in which a drop of protein solution 
is brought very gradually to supersatu ration by loss of water from the droplet to the larger reservoir that contains salt 
or polyethylene glycol solution. 

Different crystal forms can be more or less welt-ordered and hence give diffraction patterns of different quality. 
As a general rule, the more closely the protein molecules pack, and consequently the less water the crystals contain, the 
40 better is the diffraction pattern because the molecules are better ordered in the crystal. 

X-rays are electromagnetic radiation at short wavelengths, emitted when electrons jump from a higher to a 
lower energy state. In conventional sources in the laboratory, x-rays are produced by high-voltage tubes in which a 
metal plate, the anode, is bombarded with accelerating electrons and thereby caused to emit x-rays of a specific 
wavelength, so-called monochromatic x-rays. The high voltage rapidly heats up the metal plate, which therefore has 
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to be cooled. Efficient cooling is achieved by so-called rotating anode x-ray generators, where the metal p.a.e revolves 
during the experiment so that different parts are heated up. 

More powerfu. x-ray beams can be produced in synchrotron storage rings where electrons (or positrons) travel 
close to the speed of light. These particles emit very strong radiation a, all wavelengths from short gamma rays to visible 
l,gh, When used as an x-ray source, only radiation within a window of suitable wavelengths is channeled from the 
storage ring. Polychromatic x-ray beams are produced by having a broad window that a.lows ,hrou ah x-ray radiation 
with wavelengths of 0.2 - 3.5 A. w ««"«ion 

In diffraction experiments a narrow and parallel beam of x-rays is taken out from the x-rav source and directed 
onto the crystal to produce diffracted beam, The incident primary beam causes damage to both protein and solvent 
10 moecues. ^^^^^^^^^^ T^bea™ 

„ I T 7 differem direC,i ° nS 10 Pr0dU " a " P ° SSib,e "»> - « - crysl. is roZ 

in the beam during the experiment. 

exposed ^Il^T 7 ™ Ci,her °" ' f " m • C,aSSiCa ' mCth0d ' ° r * a " e,eCtronie 
15 IT , , ** diSi,iZCd " y 8 SCanning ^ Whereas actors feed the signals they 

h y ,n d a s r form ,mo a compu,er E,ee,ronie - *— - < < 

the time required to collect and measure diffraction data. 

on each J"*" ^ T ^ " *** ^ ^ ^ ° f th < ™ the electrons 
on ach atom and cause them to oscillate. The oscillating electrons serve as a new source of x-rays which are emine 

20 ^-*"*"^»«— • ^--(andhencetheire^are^r^r 
20 ^ens ona. array, as ,„ a crystal, the x-rays emitted from the oscillating electrons interfere with one another In mo s, 

doctor 10 " d,,fraC,ed ^ ° f r8dia,,0n - ^ bC reC ° rded * 9 «""» " > P^-c P'- or 

The diffraction pattern obtained in an x-ray experiment is related to the crystal that caused the diffraction X 

m d, tance ,s equal to the wavelength of the x-ray beam. This distance is dependent on the reflection an g ,e which i 
equal to the angle between the primary beam and the planes. 8 

The relationship between the reflection angle (6). the distance between the planes (d). and the wavelength (A) 

Briefly, the position on the film of the diffraction data re.ates each spot to a specific se, of p.anes through the cZ 
By using Bragg's law. these positions can be used to determine the size of the unit cel. 

«°™ in a scatters x-rays in all directions, and only those that positively interfere with one another 
1 r^.^ ^ 10 " bMmS - ~ * — - a disL, diffracIZ "at: 

35 £ Ta 1 F ra T n / PW " reSUh ° f iWerferenCe ° f a " ^ Wi,h ,hC — 

from all atoms. For example, for the protein crystal of myoglobin, each of the about 20 000 diffracted beam, ,h« h 

been measured contain scattered x-rays from each of the around .500 atoms in the moZ To ex,^ V 

handle such problems is called the Fourier transform. 

40 which we E ca C n h r raCted ^ " " 3 °" f " m - h «* -p.itude 

T Z C " ^ SP ° t: WaVe,eng,h - Whkh iS * * «" ~ - «he Phase 

I 1 1 h"" eXPer,memS - A " ,hrM Pr0PenieS ^ a " ° f diffra « ed ^ * — to determine 
ine posmon of the atoms giving rise to the diffracted beams. 

F ° r ,ar?er mo,ecules - P™*» crvstaliographers have determined the phases in manv cases using a method cal.ed 
. p e isomorphous replacement (MIR) (including heavy meta, scattering), which requires the induct on of ^ 
^scanerersintotheunitcellofthecrysta,. TT.se additions are usually heavy atoms (so that they make a si 8n I 
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contribution to the diffraction partem), such tha, there should no, be too .any of them (so tha, their positions can be 
located); and they shou.d not change the structure of the molecule or of the crysta. cell, the crystals should be 
.somorphous. .somorphous repbcemem is usual.y done by diffusing different heavy-meta. complexes into the channe.s 
of the preformed protein crystals. The protein molecules expose side chains (such as SH groups) into these solvent 
hannels tha, are ab.e to bind heavy metal, It is also possible to replace endogenous light metals in meul.oproteins with 
heav.er ones, e.g., zinc by mercury, or calcium by samarium. 

Since such heavy metals contain many more electrons than the light atoms (H. N, C, O, and S) of the protein 
they scatter x-n,ys more strongly. All diffracted beams would therefore increase in intensity after heavy metal 
subst.tut.on ,f a mterference were positive. In fac, however, some interference is negative; colouent.y. fo o"n 
heavy-metal substmu.o, some spots measurab.y increase in intensity, others decrease, and many show no detect 

substitu, PhaS ;. diffe ; enCeS be,Wee " diffraC,ed -P- <*» «» d «~< f-m intensity changes following heavy-meta, 
ubsftut.o, Ftrst. the .ntensity differences are used to deduce the positions of the heavy atoms in the crysta. u it ceU 
ouner summat.ons of these intensity differences give ntaps of the vectors between the heavy atoms the so-ca 

Panerson maps. From these vector maps the atomic arrangement of the heavy atoms is deduced. From the positions 
»,e heavy metals in the unit ce... one can ca.cu.ate the amp.itudes and phases of their contribution to the 

beams of protein crystals containing heavy metals. 

metal ^T^T ^ * ^ ** *~ "** *« Pr °' ein in the absence "«v y - 

meal atoms. As bom the phase and amplitude of the heavy metals and the amplitude of the protein alone is known as 

we,, as the amplitude of the protein plus heavy meta,s (,e.. protein heavy-meta. complex, one phase and «^ 

amp tudes are known. From this, the interference of the x-rays scattered by the heavy meta.s and protein can be 

htn! S ^ C,iVe0rdeS,raC,iVe - ^ eXten,of P° S « i --^,ivein«erference,wi,hknow.ed g eof 

«h phase of the heavy metal, give an estimate of the phase of the protein. Because two different phase ang.es are 
de.eHn.ned and are equally good so.utions, a second heavy-meta, com P .ex can be used which a.so gives two pLib.e 

me^on^ 8 T " " ** ^ PrCVi ° US phaSe ang,es; il * erefcre ^Presents 

correct phase angle, .n practice, more than two different heavy-meta. complexes are usual.y made in order to give 

a reasonably good phase determination for al. reflections. Each individua, phase estimate contains experimental errors 

ans.ng from errors ,„ the measured amplitudes. Furthermore, for many reflections, the intensity differences are too small 

to measure after one particular isomorphous replacement, and others can be tried 

densitv ™ C T P K i,UdeS "* PhaSCS ° fthe difftaC,i ° n Pr ° tein Crysta,$ " » "iculate an e.eoron- 

part.cu.ar ammo ac.d sequence. The mterpretation of the electron-density map is made more comp.ex by several 
.m.u,ons o *e da,. First of a„. the map fcelf contains errors, main, due to errors in the phase ang.L. ,n addZ 
he quahty of the map depends on the reso.ution of the diffraction data, which in turn depends on how weH-ordered 

th.s number .s. the h.gher the resolut.on and therefore the greater the amount of detail that can be seen 

Bu.ld.ng the initial model is a trial-and-error process. First, one has to decide how the polypeptide chain 

Z 1 n C : Ch3inS ,0 kn0W " SeqUenCC ° f P ° ,yPeptide - Whe " 3 ™°" abl « -ce has 

finally been obunned. an ,n.t,a. mode, is built to give the best ft of the atoms to the electron density. Computer -raphics 

are used both for chain tracing and for mode, bui.ding to present .he data and manipulated the models 

better tha^? *>"aT' """k' W '" con ' a ' n some errors. Provided the protein crystals diffract to high enough resolution (e.g. 

in, ot i ' T " a " ° f Crr0rS * rCm0Ved by C ^ a »o^ic refinement of .he mode, 

observed d.ffracuon ampl.tudes and those ca.cu.ated for a hypo.he.ica. crysta, con.aining ,he model (ins.ead of ,he real 
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ir ^ ^ to , _ „ protein „ _ e,.^ ^ 

Most x-ray structures are determined to a resolution between 1 .7 A and 3 5 A Electron .ws*. 

An amino acid sequence is preferred for accurate x-ray structure determination iw 

oonus. Aee. e.g.. Blundell. m/ra; Oxender, infra; McPherson, infra: WyckofT infra. 
Isolated PPCA and pPPCA Polypeptides 

A PPCA or pPPCA polypeptide can refer to any subset of a PPCA or dPPPa « » a 
foment, consensus sequence or repeating unit thereof. A PPCA or pPPC^polIe ofTh ' 
be prepared by, e.g., ■ Polypept.de of the present mvention can 

2 ^ < a ) recombinant DNA methods; 

(b) proteolytic digestion of me intact mole Cul e or a domain, subdomain or fragment thereof- 

(c) chemical peptide synthesis methods well-known in the art; and/or 

(d) by any other method capable of producing a PPCA nr nPPr-A 
similar to a structura, or functional subdomain of a PPCA or a^PPCA ' ^ ' ' 



30 



35 



40 



45 



^^^^^mtyofPPCAorpPPCAcanbescreenedaccordingtoknownscreeningassavs Then,, 
PPPCA. such as prccong ^tiviij-. inhibiting ictMry a enzyme mivin Hon . . , 

According to the present invention, a PPCA or dPPCA ine.nHpc a „ * 
~.su C hasa,^ 

.-4 subdomains of the cap domain and/or , -4, subdomains of the core domain (as J^^T. 

value or combination thereof. Preferably 1-4 sets of each of « | M « or dimers). or any range, 

included. 31 leaSt ° ne COre or ca P do ~ or subdomains are 

The structure of a monomer or domain of at least one PPCA includes at | M « m ^ 
DPPCAofihpnr-. . ^ rtmcmaesal,easI one subdomain of a PPCA of » 

prrt.A of the present invention can include one or more of the following tuMnnun a - l a ^Aota 
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Core domain subdomains: CBl, 21-27; CB2. 32-39; Cp 3 , 50-54; C«l. 63-67; CB4 73-75- CBS 82 

r\T,Z 4 ' nXa2 ' " 8 " ,35:CP7 ' ,44 - M9;C03 - '5M63; CP8, .7,-,77; Ca4. 307-3.3; Ca5.'3,6-32,- 
C«6. 336-34.; Ca7. 350-359; C P 9. 363-369; Ca8. 377-386; CB.O. 391-401; CBl ,, 407-416; CB.2 4.9-424- 
Ca9. 431-434; CalO, 436-447; and ^ W ' 

Cap domain subdomains: Hal, 183-196; Ha2, 202-212; Ha 3, 226-240; MB., 261-264; MB2 267- 

domain are slightly different than in monomer 1. P 
A PPCA or pPPCA polypeptide of the invention can have a, least 80-/. homology, such as 80-100% overall 
homo log y or ^entity, with one or more corresponding PPCA or pPPCA subdomains or fragments as described hel 
such as a 4-542 ammo acid fragment or portion of the amino acid sequence of Figures ,3, ,4 or 15 As w OU d £ 

r pPPCA polypeptide of the invention, when expressed i„ a suitable host cel.. or otherwise synthesized, to provide 1 
one structural or functional feature of a native PPCA or pPPCA. such as a, leas, one PPCA-re.a ed io J c 

z lzr n :r b : assayed usins a sui,ab,e assay - ,o ^ a « - - ™ ■**•««. jzi 

o c PPCAs or pPPCAs of the invention. A PPCA or pPPCA polypeptide of the invention is not natura.ly clrrin! 

a r °T7 S " ^ ' PUnT,ed W M ^ ~ — in — E *-P'« PPCA° 

H aSSay m ^ **' CathePSi " A a aL J Bi °' Ch ™- ^•>^54-,4762 (.99,); Endothe^ 

d am,^ acnvity (Jackman. e, a,.. J. Biol. C*e m . 267:2872-2875(,992); and tachykinin deamidase activity (Jackl 
el aL J. Biol. Chem. 265: 1 1265-1 1272 (1990)). * Wackman,. 

GAP corncT" h0m0 '° 8y ° r bC detem,ined - f ° r by C °™ *<*™« information using the 

GAP computer program, vers.on 6.0. availab.e from the University of Wisconsin Genetics Computer Group (UWGCG) 
The GAP program ut.hzes the alignment method of Needleman and Wunsch (J. Mo L Biol. 48:443 (1970) as revised 
ymKhand^ 

*«rof, etwo sequences. Tne preferred defau.t pa^etersf.me GAP program i„c.ude :(I) a uniury comparison 
mamx (containing a value of . for identities and 0 for non-idemities) and the weighted comparison ma J of GnbZ 
and Burgess. Nuci Acids Res . I4:6745 ( J986) . ^ described by ^ P ATl^OFPH^ 

SEeUWCE^O ZTZUCTUK, National Biomedical Research Foundation, pp. 35^-358 w£ZZ7Z 
for each gap and an additiona. 0,0 penalty for each symbo. in each gap; and (3) no pena.ty for end gaps 

Thus, one of ordinary skill in the an, given the teachings and guidance presented in the present specification - 
w^nowhowto add. de.ete or substitute other amino acid residues in other ^,K«of .ppcaT^T^ 
substituted, deletional or additional variants thereof. 

in which ^r mn8 eXamP '^ ° f SUbS,itU,i ° nS ° f 3 PPCA ° r PPPCA d ~ - Peptide of the invention are those 

al , r 0 rCSidUe " Pr0,ein m ° ,eCU,e ^ bee " rem ° Ved a " d a **« added in its 

place accordmg to the fol.owing Tab.e 2. The types of substitutions which can be made in the protein or oele 

o«e,n of Afferent spec.es. such those presented in Figure , 5. Based on such an anaiysis. alternative substitution al 
defined herem as exchanges within one of the following five groups: 

Small aliphatic, nonpolar or slightly polar residues: Ala. Ser. Thr (Pro. Gly); 



2 Polar, negatively charged residues and their amides: Asp. Asn. Glu. Gin: 

3 Polar, positively charged residues: 
His. Arg. Lys; 

4 Large aliphatic, nonpolar residues: 
Met. Leu. lie. Val (Cys); and 

5 Large aromatic residues: Phe. Tyr. Trp. 

radi, , k M0 " de ' e , ,i0nS additi ° nS - ^ SUbS,UU,i0nS aCC ° rdine ,0 inVemi ° n " ,hosc which *> not produce 
rad,ca. chan g es ,n the characteristics of the protein or peptide molecule. "Characteristics" is defined in a non-inc^ 
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manner to define both changes in secondary structure, e.g. .-helix or (J-sheet. as well as changes in physiological 
activity, e.g.. ,„ b.ological activity assays. However, when the exact effect of the substitution, deletion, or addition is 
to be confirmed, one skilled in the an will appreciate that the effect of a, least one substitution, addition or deletion will 
be evaluated by at leas, one PPCA or pPPCA screening assay, such as. bu, not limited to, immunoassays or bioassays 
3 to confirm at least one PPCA or pPPCA biological activity. 

Surprisingly, a PPCA and/or a pPPCA is now discovered to have serine carboxylase activity and 
correspond? structural features, although having only about 30% sequence identity to wheat and yeast serine 

7^7^Zl^Z Carb0XyPCP,idaSCS 8re mCmberS ° f the Mrolase fo.d family (Liao et al., BioOenistry 
"^^(^Endn^^^ 

10 The senne carboxypeptidases have peptidase activity a, acidic pH ( P H 4.5-5.5) as well as deamidase and esterase 
act.v.t.es at P H 7 (reviewed in Breddam et al. CarlsbergRes C „. J/:83-I28 (1986); Raw.ings & Barren, Methods 
'"I"*" 0 ' 08 "- (, " 4)) - -yn-fc assays have revealed .at only the mature form 

of PPCA possesses a serine carboxypeptidase activity, which is similar to that of lysosomal cathepsin A and has a 
preference for hydrophobic substrates such as the dipeptide Phe-Ala (Galjart e, al.. J. Biol. Chem. 2^:14754-14762 
(199,)). On the basts of sequence alignments with members of the serine carboxypeptidase family, mutagenesis studies 
and the structure determination of pPPCA, the catalytic triad in PPCA has now been determined to be formed by the 
residues Ser ] 50. His 429 and Asp 372 
PPCA andpPPCA Expression for Isolation and Purification 

A nucleic acid sequence encoding a PPCA or a pPPCA (Galjart et al.. Celt. 5*755-764 (1988)) can be 
recornb.ned with vector DNA in accordance with conventional techniques, including blunt-ended or staggered-ended 
term.n, for hgation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as 
approbate alkaline phosphafcse treatment to avoid undesirable joining, and ligation with appropriate ligases 
Techniques for such manipulations are disclosed, e.g., in Sambrook et al, Molecular Cloning: A Laboratory Manual 
Second ed.t.on. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY ( 1 989); and Ausube. et al.,Current Protocols 
m Molecular Biology, Wiley Interscience. N.Y., (1988-1995) and are well known in the art 

nuc , e , „* nUC ' eiC aCid i m ° ,eCU,e - SUCh 35 DNA ' h Mid to * "»P^ ^ pressing" a polypeptide if i, contains 
nuc.eot.de sequences which contain transcriptiona. and trans.ational regulatory information and such sequences are 

17, IT l ° nUde ° ,ide SeqUenCCS WhiCh CnCOde *• An °P- We """age is a linkage in which the 

regulatory DNA sequences and the DNA sequence sought to be expressed are connected in such a way as to permit gene 
expresston as a PPCA , pPPCA or fragment thereof, in recoverable amount, The precise nature of the regulatory 
reg.ons needed for gene expression can vary from organism to organism, as is well known in the analogous art See 
e.g., Sambrook, infra and Ausubel, infra. 

The invention accordingly encompasses the expression of a PPCA or a pPPCA. in either prokaryotic or 
eukaryofc cells, although eukaryotic expression is preferred. Preferred hosts are bacteria, or eukaryotic hosts including 
bactena. yeas, insects, fungi, bird and mamma.ian cells either * v/vo. or in situ, or host ce.ls of mammalian, insect bird 
or yeas, ong,n. It ,s preferred that the mammalian cell or tissue is of human, primate, hamster, rabbi,, rodent cow P i» 
sheep, horse, goat, dog or cat origin, but any other mammalian cell can be used. 

Eukaryotic hosts can include yeas, insects, fungi, and mammalian cel.s either in Wvo, or in tissue culture 

40 cuir 7 r? hos,s 7 ako inc,ude - but - ™ nmi,ed ,o insect ce,,s - ~ ,ian ce,,s ««» *• ^ - - *« 

CHn v , ! mamm " Ce " S ,nC,Ude °° Cy,eS - " eLa Ce " S ' Ce " S 0f fibrob,ast •«* « VERO or 

<~nu-K] . or ceils of lymphoid origin and their derivatives. 

Mammalian cells provide pos.-rrans.ationa. modifications to protein molecules inc.uding correct folding or 
glycosy a„on a, correc, she, Mamma.ian cells which can be useful as hosts inc.ude celb of fibrob.as, origin such a S 

HZTT \ 3T3 - VER ° " CH °- " Ce " S ° f,ymPh0id bU « ™ ™« »■ - Vbri-o-,.' 

SP^/O-Ag.4 or the murine myeloma P3-X63Ag8. hamster cel. lines (e.g.. CHO-KI and progenitors, e.g.. CHO- 
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DUXB1 1) and their derivatives. One preferred type of mammalian cells are cells which are intended to replace the 
function of the genetically deficient cells in vivo. Neuronally derived cells are preferred for gene therapy of disorders 
of the nervous system. For a mammalian cell host, many possible vector systems are available for the expression of 
at least one PPCA or pPPCA. A wide variety of transcriptional and translational regulatory sequences can be employed, 
depending upon the nature of the host. The transcriptional and translational regulatory signals can be derived from viral 
sources, such as, but not limited to. adenovirus, bovine papilloma virus, Simian virus, or the like, where the regulatory 
signals are associated with a particular gene which has a high level of expression. Alternatively, promoters from 
mammalian expression products, such as, but not limited to. actin. collagen, myosin, protein production. 

When live insects are to be used, silk moth caterpillars and baculoviral vectors are presently preferred hosts 
for large scale PPCA or pPPCA production according to the invention. Production of PPCA or pPPCA in insects can 
be achieved, for example, by infecting the insect host with a baculovirus engineered to express transmembrane 
polypeptide by methods known to those skilled in the related arts. See Ausubel infra, §§16.8-16.1 1. 

In a preferred embodiment, the introduced nucleotide sequence will be incorporated into a plasmid or viral 
vector capable of autonomous replication in the recipient host. Any of a wide variety of vectors can be employed for 
this purpose. See, e.g., Ausubel a al, infra, §§ 1.5, 1.10,7.1,7.3,8.1,9.6,9.7, 13.4. 16.2. 16.6, and 16.8-16.11. Factors 
of importance in selecting a particular plasmid or viral vector include: the ease with which recipient cells that contain 
the vector can be recognized and selected from those recipient cells which do not contain the vector: the number of 
copies of the vector which are desired in a particular host: and whether it is desirable to be able to "shuttle" the vector 
between host cells of different species. 

Different host cells have characteristic and specific mechanisms for the translational and post-translational 
processing and modification (e.g., glycosylation, cleavage) of proteins. Appropriate cell lines or host systems can be 
chosen to ensure the desired modification and processing of the foreign protein expressed. For example, expression in 
a bacterial system can be used to produce an unglycosylated core protein product. Expression in yeast will produce a 
glycosylated product. Expression in mammalian cells can be used to ensure "native" glycosylation of the heterologous 
PPCA or pPPCA. Furthermore, different vector/host expression systems can effect processing reactions such as 
proteolytic cleavages to different extents. 

As discussed above, expression of PPCA orpPPCA in eukaryotic hosts requires the use of eukaryotic regulatory 
regions. Such regions will, in general, include a promoter region sufficient to direct the initiation of RNA synthesis. 
See. e.g., Ausubel, infra; Sam brook, infra. 

Once the vector or nucleic acid molecule containing the construct(s) has been prepared for expression, the DNA 
construct(s) can be introduced into an appropriate host cell by any of a variety of suitable means, i.e., transformation, 
transfection, conjugation, protoplast fusion, electroporation, particle gun technology, calcium phosphate-precipitation] 
direct microinjection, and the like. After the introduction of the vector, recipient cells are grown in a selective medium] 
which selects for the growth of vector-containing cells. Expression of the cloned gene molecule(s) results in the 
production of a PPCA or pPPCA. This can take place in the transformed cells as such, or following the induction of 
these cells to differentiate (for example, by administration of bromodeoxyuracil to neuroblastoma cells or the like). 

A PPCA or pPPCA. or fragments thereof, of this invention can be obiained by expression from recombinant 
DNA according to known methods. Alternatively, a PPCA or pPPCA can be purified from biological material. A PPCA 
or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse liver, pig kidney, 
bovine testes, bovine liver, and the like) of various genus and species. 

The PPCA or pPPCA can be isolated and purified in accordance with conventional method steps, such as 
extraction, precipitation, chromatography, affinity chromatography, electrophoresis, or the like. For example cells 
expressing at least one PPCA or pPPCA in suitable levels can be collected by centrifugation. or with suitable buffers, 
lysed. and the protein isolated by column chromatography, for example, on DEAE-cellulose. phosphocellulose. 
polyribocytidylic acid-acarose. hydroxyapatite or by electrophoresis or immunoprecipitation. Alternatively, a pPPCA 
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or PPCA can be isoiated by the use of antibodies, such as. but not limited to, a PPCA- or pPPCA-specific antibody. Such 
antibodies can be obtained by known method steps (see, e.g., Harlow and Lane ANTIBODIES: A LABORATORY 
MANUAL Cold Spring Harbor Laboratory (1988): Colligan et aL eds.. Current Protocols in Immunology. Greene 
Publishing Assoc. and Wiley Interscience, N.Y., (1992, 1993), the contents of which references are entirely incorporated 
herein by reference). 

A PPCA or a pPPCA can be purified from different mammalian tissues (e.g.. human placenta, rat liver, mouse 
liver, pig kidney, bovine testes, bovine liver, and the like) of various genus and species, using known techniques such 
as gel filtration, phase separation and affinity chromatography, e .g..using polyclonal or monoclonal antibodies specific 
for a PPCA or pPPCA, according to known methods. See, e.g., Oxender et at., Protein Engineerings Liss, New York 
(1986). 

Overview of PPCA or pPPCA Purification and Crystallization Methods 

In general, a PPCA or pPPCA is isolated in soluble form in sufficient purity and concentration (eg., a monomer 
or dimer) for crystallization. The PPCA or pPPCA is then isolated and assayed for biological activity (eg., cathepsin 
A) and for lack of aggregation (which interferes with crystallization). The purified PPCA or pPPCA preferably runs 
as a single band for each monomer under reducing or nonreducing poly aery lam ide gel electrophoresis (PAGE) 
(nonreducing is used to evaluate the presence of cysteine bridges). 

The purified PPCA or pPPCA is preferably crystallized under varying conditions of at least one of the 
following: pH. buffer type, buffer concentration, salt type, polymer type, polymer concentration, other precipitating 
ligands and concentration of purified PPCA or pPPCA. See, eg., known methods (Blundell et aL Protein 
Crystallography*. Academic Press, London (1976); Oxender, infra; McPherson, The Preparation and Analysis of Protein 
Crystals, Wiley Interscience, N Y. (1982)) or methods provided in a commercial kit, such as CRYSTAL SCREEN 
(Hampton Research. Riverside, CA). The crystallized PPCA protein can optionally be tested for at least one PPCA 
activity and differently sized and shaped crystals are further tested for suitability for x-ray diffraction. Generally, larger 
crystals provide better crystal lographic data than smaller crystals, and thicker crystals provide better crystallographic 
data than thinner crystals. See. e.g., Blundell, infra; Oxender, infra; McPherson, infra; Wyckoff et aL, Diffraction 
Methods for Biological Macromoiecules\o\s. 1 14-1 15, Methods in Emymology, Academic Press. Orlando, FL (!985). 
Protein Crystallization Methods 

The hanging drop method is preferably used to crystallize the purified protein. See, e.g., Blundell, infra; 
Oxender, infra; McPherson, infra; Wyckoff, infra; Taylor et al s J. Mot. Biol. 226: 1287- 1290 (1992); Takimoto et at. 
( 1 992), infra; CRYSTAL SCREEN, Hampton Research. 

A mixture of the purified protein and precipitant can include the following: 

• pH(eg.,7-9); 

• buffer type (eg., tromethamine (TRIZMA), sodium azide (NaN 3 ), phosphate, sodium, or cacodylate 
acetates, imidazole, Tris HCI, sodium hepes); 

• buffer concentration (eg., 1-100 mM); 

• salt type (eg., sodium azide, calcium chloride, sodium citrate, magnesium chloride, ammonium 
acetate, ammonium sulfate, potassium phosphate, magnesium acetate, zinc acetate; calcium acetate) 

• polymer type and concentration: (eg., polyethylene glycol (PEG) 1 -50%, type 400-10,000); 

• other additives (salts: potassium, sodium, tartrate, ammonium sulfate, sodium acetate, lithium sulfate, 
sodium formate, sodium citrate, magnesium formate, sodium phosphate, potassium phosphate: 
organics: 2-propanol; non-volatile: 2-methyl-2,4-pemanediol); P-octyl glucoside and 

• concentration of purified PPCA or pPPCA (eg.. 1 .0-100 mg/ml). 
See, e.g., CRYSTAL SCREEN. Hampton Research. 

A non-limiting example of such crystallization conditions is the following: 

• purified PPCA or pPPCA protein (eg . 5 mg/ml); 
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• (2) solutions in serial mixtures 

( 1 ) 40-80 mM TRIZMA. 0.05-2.0 mM NaN Jf ; 

(2) 2-30% Polyethylene glycol (PEG) 8000 buffered with 40-80 mM TRIZMA and 
0.05-2.0 mMNaN, 

• 0.05-0.5% p-octyl glucoside; 

• at an overall pH of about 8.0-8.3. 

The above mixtures are used and screened by varying at least one of pH. buffer type, buffer concentration 
precipitating salt type or additive or their concentrations, PEG type, PEG concentration, and protein concentration' 
Crystals ranging in size from 0.1-0.9 mm are formed in I -14 days. T*ese crystals diffract x-rays to at least 10 A 
resolution, such as 0.15-10.0 A. or any range of value therein, such as 1.5, 1.6, 1.7 18 I9202I 22232425 
2.6 2.1, 2.8, 2.9, 3.0, 3.1 . 3.2, 3.3, 3.4 or 3.5, with 3.5 A or higher being preferred for the highest 'resolution. In addition 
to diffraction patterns having this highest resolution, lower resolution, such as 25-3.5 A can also be used See, e.v 
Blundell, infra: Oxender, infra: McPherson, infra: Wyckoff, infra; ' 
Protein Crystals 

Crystals appear after 1-14 days and continue to grow on subsequent days. Some of the crystals can be 
opt.onally removed, washed, and assayed for biological activity (e.g., PPCA). which activity is preferred for usin* in 
further characterizations. Other washed crystals are preferably run on a gel and stained, and those that migrate in the 
same position as the purified PPCA or pPPCA are preferably used. From two to one hundred crystals are observed in 
one drop and crystal forms can occur, such as. but not limited to, orthorombic, bipyramidal. rhomboid, and cubic Initial 
x-ray ana.yses indicate that such crystals diffract at moderately high to high resolution. When fewer crystals are 
produced in a drop, they can be much larger size, eg., 0.4-0.9 mm. See. e.g., B.unde... infra. Oxender infra 
McPherson, infra; Wyckoff, infra; ' ' 

PPCA andpPPCA X-ray Crystallography Methods 

The crystals so produced for a PPCA or pPPCA are x-ray analyzed using a suitable x-ray source. Diffraction 
patterns are obtained. Crystals are preferably stable for at least 10 hrs in the x-ray beam . Frozen crystals (* g -220 
to -50-C) are opt.onally used for longer x-ray exposures (eg.. 5-72 hrs). the crystals being relatively more stable 'to the 
x-rays .n the frozen state. To collect the maximum number of useful reflections, multiple frames are optionally collected 
as the crystal is rotated in the x-ray beam, e.g., for 5-72 hrs. Larger crystals (>0.2 mm) are preferred, to increase the 
resolution of the x-ray diffraction patterns obtained. Crystals are preferably analyzed using a synchrotron high energy 
x-ray source. Usmg frozen crystals, x-ray diffraction data is collected on crystals that diffract to at least a relatively high 
resolut.on of 10-1.5 A, with lower resolutions also useful, such as 25-10A. sufficient to solve the three-dimensional 
structure of a PPCA or pPPCA in considerable detail, as presented herein. 

Passing an x-ray beam through a crystal produces a diffraction pattern as a result of the x-rays interacting and 
bemg scattered by the contents of the crystal. The diffraction pattern can be visualized using, e.g.. an image plate or 
film, resultmg m an image with spots corresponding to the diffracted x-rays. The positions of the spots in the diffraction 
pattern are used to determine parameters intrinsic to the crystal (such as unicell parameters) and to *ain information on 
the packmg of the molecules in the crystal. The intensity of the spots contains the Fourier transformation of the 
molecules m the crystal, i.e.. information on each atom in the crystal and hence of the crystallized molecule 

After data collection of diffraction patterns, the data is processed. This includes measurina the spots on each 
d.frract.on pattern in terms of position and intensity. This information is processed (i.e.. mathematical operations are 
performed on the data (such as scaling, merging and convening the data from intensity of diffracted beams to 
ampluudes)) to yield a set of data which is in a form as can be used for the further structure determination of the 
molecule crystallized. The amplitudes of the diffracted x-rays are then combined with calculated phases to produce an 
electron densuy map of the contents of the crystal. In this electron density map. the structure of the molecules (as 
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present in the crystal) is built. The phases can be determined with various known techniques, one being molecular 
replacement. 

For the molecular replacement technique one takes a known three dimensional structure thought to share 
structural homology with the structure to be determined, to generate after calculations a first set of initial phases. These 
phases are then combined with the diffraction information of the molecule for which you want to solve the structure of. 
The result is an electron density map of the molecules in the crystal from which the diffraction patterns originate. 

The phases can be further optimized using a technique called density modification, which allows electron 
density maps of better quality to be produced facilitating interpretation and model building therein. The atomic model 
is then refined by allowing the atoms in the model to move in order to match the diffraction data as well as possible 
while continuing to satisfy stereochemical constraints (sensible bond lengths, bond angles and the like). See. e.g., 
Blundell, infra; Oxender, infra, McPherson, infra; Wyckoff, infra; 
Computer Related Embodiments 

An amino acid sequence of a PPCA or pPPCA and/or atomic coordinate/x-ray diffraction data useful for 
computer structure determination of a PPCA, pPPCA or a portion thereof, can be "provided" in a variety of mediums 
to facilitate use thereof. As used herein, provided refers to a manufacture, which contains a PPCA or pPPCA amino acid 
sequence and/or atomic coordinate/x-ray diffraction data of the present invention, e.g.. the amino sequence provided 
m Figures 13-15. a representative fragment thereof, or an amino acid sequence having at least 80-100% overall identity 
to a 5-542 amino acid fragment of an amino acid sequence of Figures 13-15. Such a method provides the amino acid 
sequence and/or atomic coordinate/x-ray diffraction data in a form which allows a skilled artisan to analyze and 
determine the three- dimensional structure of a PPCA, a pPPCA or a subdomain thereof. 

In one application of this embodiment, PPCA, pPPCA, or at least one subdomain thereof, amino acid sequence 
and/or atomic coordinate/x-ray diffraction data of the present invention is recorded on computer readable media. As 
used herein, "computer readable media" refers to any medium which can be read and accessed directly by a computer 
Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and 
magnetic tape; optical storage media such as optical discs or CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. A skilled artisan can readily appreciate how any 
of the presently known computer readable media can be used to create a manufacture comprising computer readable 
medium having recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention. 

As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled 
artisan can readily adopt any of the presently known methods for recording information on computer readable medium 
to generate manufactures comprising an amino acid sequence and/or atomic coordinate/x-ray diffraction data information 
of the present invention. 

A variety of data storage structures are available to a skilled artisan for creating a computer readable medium 
hav.ng recorded thereon an amino acid sequence and/or atomic coordinate/x-ray diffraction data of the present invention 
The choice of the data storage structure will generally be based on the means chosen to access the stored information 
In addition, a variety of data processor programs and formats can be used to store the sequence and x-ray data 
information of the present invention on computer readable medium. The sequence information can be represented in 
a word processing text file, formatted in commercially-available software such as WordPerfect and MICROSOFT Word 
or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase, Oracle, or the like 
A skilled artisan can readily adapt any number of dataprocessor structuring formats (e.g. text file or database) in order 
to obtain computer readable medium having recorded thereon the information of the present invention. 

By providing on computer readable media having stored therein a PPCA or pPPCA sequence and/or atomic 
coordinates based on x-ray diffraction data, a skilled artisan can routinely access the sequence and atomic coordinate 
or x-ray diffraction data to model a PPCA. pPPCA. a subdomain thereof, or a ligand thereof. Computer algorithms are 
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publicly and commercially available which allow a skilled artisan to access this data provided on a computer readable 
medium and analyze it for structure determination and/or RDD. See, e.g.. Biotechnology Software Directory. Mary Ann 
Liebert Pub!., New York (1995). 

The present invention further provides systems, particularly computer-based systems, which contain the 
sequence and/or diffraction data described herein. Such systems are designed to do structure determination and RDD 
for a PPCA, pPPCA or at least one subdomain thereof. Non-limiting examples are microcomputer workstations 
available from Silicon Graphics Incorporated and Sun Microsystems running Unix based. Windows NT or IBM OS/2 
operating systems. 

As used herein, "a computer-based system" refers to the hardware means, software means, and data storage 
means used to analyze the sequence and/or atomic coordinate/x-ray diffraction data of the present invention The 
rmmmum hardware means of the computer-based systems of the present invention comprises a central processing unit 
(CPU), ,nput means, output means, and data storage means. A skilled artisan can readily appreciate which of the 
currently available computer-based system are suitable for use in the present invention. A monitor is optionally provided 
to visualize structure data. 

As slated above, the computer-based systems of the present invention comprise a data storage means havin- 
stored themn a PPCA, pPPCA or fragment sequence and/or atomic coordinate/x-ray diffraction data of the present 
invention and the necessary hardware means and software means for supporting and implement^ an analys.s means 
As used herem. "data storage means" refers to memory which can store sequence or atomic coordinate/x-ray diffraction 
data of the present invention, or a memory access means which can access manufactures having recorded thereon the 
20 sequence or x-ray data of the present invention. 

As used herein, "search means" or "analysis means" refers to one or more programs which are implemented 
on the computer-based system to compare a target sequence or target structural motif with the sequence or x-ray data 
stored within the data storage means. Search means are used to identify fragments or regions of a PPCA or pPPCA 
wh,ch match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a 
vanety of commercially available software for conducting search means are and can be used in the computer-based 
systems of the present invention. A skilled artisan can readily recognize that any one of the available algorithms or 
.mplement.ng software packages for conducting computer analyses that can be adapted for use in the present computer- 
based systems. 

As used herein, "a target structural motif," or "target motif," refers to any rationally selected sequence or 
comb,nat.on of sequences in which the sequenced) are chosen based on a three-dimensional configuration or electron 
dens,ty map which is formed upon the folding of the target motif. There are a variety of target motifs known in the art 
Proton target motifs include, but are not limited to. enzymic active sites, structural subdomains. epitopes functional 
doma,ns and signal sequences. A variety of structural formats for the input and output means can be used to input and 
output the information in the computer-based systems of the present invention. 

A variety of comparing means can be used to compare a target sequence or target motif with the data storage 
means to .dentify structural motifs or interpret electron density maps derived in pan from the atomic coordinate/x-rav 
diffraction data. A skilled artisan can readily recognize that any one of the publicly available computer modeling 
programs can be used as the search means for the computer-based systems of the present invention. 

One application of this embodiment is provided in Figure 22. Figure 22 provides a block diagram of a 
computer system 102 that can be used to implement the present invention. The computer system 1CP includes a 
processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented 
as random access memory. RAM) and a variety of secondary storage memory 1 10. such as a hard drive I P a removable 
storage medium 1 .4. and a monitor 120. The removable medium storage device 1 14 may represent, for example a 
floppy d,sk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 1 ,6 (such as a floppv 
d,sk. a compact disk, a magnetic tape, etc.) containing control logic and/or data recorded therein mav be inserted into 
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the removable medium storage medium 1 14. The computer system 102 includes appropriate software for reading the 
control logic and/or the data from the removable medium storage device 114 once inserted in the removable medium 
storage device 114. 

Amino acid, encoding nucleotide or other sequence and/or atomic coordinate/x-ray diffraction data of the 
present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 
1 10, and/or a removable storage device 1 16. Software for accessing and processing the amino acid sequence and/or 
atom.c coord.nate/x-ray diffraction data (such as search tools, comparing tools, etc.) reside in main memory 108 during 
execution. The monitor 120 is optionally used to visualize the structure data. 
Structure Determination 

One or more computational steps, computer programs and/or computer algorithms are used to build a molecular 
3-D model of a PPCA or pPPCA, using amino acid sequence data from Figures 13-15 (or variants thereof) and/or atomic 
coordmaie/x-ray diffraction data, as presented herein. 

In x-ray crystallography, x-ray diffraction data and phases are combined to produce electron density maps in 
wh.ch the three-dimensional structure of a PPCA or pPPCA is then built or modeled. This structure can then be used 
for RDD of modulators of at least one PPCA- or pPPCA-related activity thai is relevant to at least one PPCA- or 
pPPCA-related pathology. 

Density Modification and Map Interpretation. Electron density maps can be calculated using such programs 
as those from the CCP4 computing package (SERC (UK) Collaborative Computing Project 4, Daresbury Laboratory 
UK, 1979). Cycles of two-fold averaging can further be used, such as with the program RAVE (Kleywegt & Jones' 
Bailey * oi. eds.. First Map ,o Final Model, SERC Daresbury Laboratory, UK, pp 59-66 (1994)) and gradual model 
expansion. For map visualization and model building a program such as "0" (Jones (1991), infra) can be used 

Refinement and Model Validation. Rigid body and positional ref.nement can be carried out using a program 
such as X-PLOR (Briinger (1992), infra), e.g.. with the stereochemical parameters of Engh and Huber (Acta Cryst 
^7:392-400 (1991)). If the model at this stage in the averaged maps still misses residues (e.g.. at least 5-10 per 
subun.t). the some or all of the missing residues can be incorporated in the model during additional cycles of positional 
refinement and model building. The ref.nement procedure can start using data from lower resolution 25-10A to 
10-3.0 A and then gradually extended to include data from 12-6A to 3.0-1.5 A. B-values (also termed temperature 
factors)for,ndividua. atoms can be refined once data of 2.8A or higher (e.g., „ pt0 1.5 A) has been added. Subsequently 
waters can be gradually added. A program such as ARP (Lamzin and Wilson, Acta Cryst. D49 129-147 (1993)) can be 
used to add crystallographic waters and as a tool to check for bad areas in the model. Programs such as PROCHECK 
Lackowskie/a/../ App t Cryst. 25:283-291 (1993)), WHATIF (Vriend. J. Mot. Graph 5:52-56 (1990)) and PROFILE 
3D (Luthy et a,. . Nature i5«J:83-85 ( 1 992)), as well as the geometrical analysis generated by X-PLOR can be been used 
to ch eck the structure for cnon A program suc() as Dssp ^ ^ u$ed jo ^ me . 

(Kabsch and Sander ( 1 983), infra). 

The structure of a PPCA or pPPCA can thus be solved with the molecular replacement procedure such as by 
using X-PLOR (Brunger (.992). infra). A partial search mode, for the monomer can be constructed using a related 
protein, such as wheat serine carboxypeptidase structure (Liao et al. ( .992). infra). The rotation and translation function 
can be solved to yield orientations and positions for the subunits in the crystallographic asymmetric unit This allows 
Phases to be determined that, when combined with information from the x-ray diffraction patterns, allows electron 
density maps of a PPCA or pPPCA to be calculated. The atomic mode, is then built using these electron densirv maps 
Cychca. nvo-fold densirv averaging can a.so be done to improve the electron density maps using a suitable program 

' m0de ' eXPa " Si0n Ca " a ' S0 bC USCd W 3dd miS5ing residues for each ™™™- suiting in a model with 
95-99.9/. of the total number residues. The model can be refined in a program such as X-PLOR (Brunger (1992) 
supra), to a suitable crystallographic R,_ The mode, data is then saved on computer readable media for use in further 

analysis, such as rational drug design. 
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Rational Design of Drugs that Interact with the PPCA or pPPCA 

The determination of the three-dimensional structure of a PPCA or pPPCA, as described herein, provides a 
basis for the design of new and specific ligands for the diagnosis and/or treatment of at least one PPCA- or pPPCA- 
related pathology. 

Several approaches can be taken for the use of the crystal structure of a PPCA or pPPCA in the rational design 
of ligands of this protein. A computer-assisted, manual examination of the active site structure is optionally done. The 
use of software such as GRID ( Goodford, 1 Med Chem. 25:849-857 (1985)) a program that determines probable 
interaction sites between probes with various functional group characteristics and the enzyme surface — is used to 
analyze the active site to determine structures of inhibiting compounds. The program calculations, with suitable 
inhibiting groups on molecules (e.g., protonated primary amines) as the probe, are used to identify potential hotspots 
around accessible positions at suitable energy contour levels. Suitable ligands, as inhibiting or stimulating modulating 
compounds or compositions, are then tested for modulating activities of at least one PPCA or pPPCA 

A diagnostic or therapeutic PPCA or pPPCA modulating ligand of the present invention can be, but is not 
limited to, at least one selected from a nucleic acid, a compound, a protein, an element, a lipid, an antibody, a saccharide, 
an isotope, a carbohydrate, an imaging agent, a lipoprotein, a glycoprotein, an enzyme, a detectable probe, and antibody 
or fragment thereof, or any combination thereof, which can be detectably labeled as for labeling antibodies. Such labels 
include, but are not limited to. enzymatic labels, radioisotope or radioactive compounds or elements, fluorescent 
compounds or metals, chem i luminescent compounds and bioluminescent compounds. Alternatively, any other known 
diagnostic or therapeutic agent can be used in a method of the invention. 

After preliminary experiments are done to determine the K m of the substrate with each enzyme activity of a 
PPCA or pPPCA. the time-dependent nature of modulation of ligand K, values are determined, (eg., by the method of 
Henderson (Biochem, 1 727:321-333 (1972)). For example, the substrate (or blank where appropriate) and enzyme 
are pre-incubated in buffer. Reactions are initiated by the addition of substrate. Aliquots are removed over a suitable 
time course and each quenched by addition into the aliquots of suitable quenching solution (e.g., sodium hydroxide in 
aqueous ethanol). The concentration of product is determined, e.g., fluorometrically, using a spectrometer . Plots of 
fluorescence against time can be close to linear over the assay period, and are used to obtain values for the initial velocity 
in the presence (V ( ) or absence (V c ) of ligand. Error is present in both axes in a Henderson plot, making it inappropriate 
for standard regression analysis (Leatherbarrow, Trends Biochem. Set 75:455-458 (1990)). Therefore, K; values are 
obtained from the data by fining to a modified version of the Henderson equation for competitive inhibition: 

Qr 2 + (£ - Q - /)r - £ = 0 

where (using the notation of Henderson (Biochem. J. 727:321-333 (1972)): 

( A < K A v 
Q = KA ' * ° and r = _JL 



This equation is solved for the positive root with the constraint that 

Q = K,((A t +K,)/KJ 

using PROCNL1N from SAS (SAS Institute Inc., Cary. North Carolina, USA) which performs nonlinear regression using 
least-square techniques. The iterative method used is optionally the multivariate secant method, similar to the Gauss- 
Newton method, except that the derivatives in the Taylor series are estimated from the histogram of iterations rather than 
supplied analytically. A suitable convergence criterion is optionally used, e.g., where there is a change in loss function 
of less than 10 8 . 
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Once modulating ligands are found and isolated or synthesized, crystallographic studies of the compounds 
complexed to a PPCA or pPPCA can be performed. As a non-limiting example. PPCA or pPPCA crystals are soaked 
for 2 days in 0.01-100 mM ligand and x-ray diffraction data are collected on an area detector and/or an image plate 
detector {e.g. , a Mar image plate detector) using a rotating anode x-ray source. Data are collected to as high a resolution 
as possible, e.g.. an inner limit of diffraction of 1.5-3.5A. An atomic model of the inhibitor is built into the difference 
Fourier map (F,,^^ •Fw.J. The model can be refined to adjust the atomic positions to improve the fit with the 
electron density maps, while maintaining correct stereochemical constraints. The model will preferably have low r.m.s. 
deviations from the ideal bond lengths, as well as for the angles, respectively, as well as a low R-factor (preferably less 
than about 25-35%, such as less than about 35, 34, 33, 32, 3 1, 30, 29, 28, 27, 26, or 25%. 

Direct measurements of enzyme inhibition provide further confirmation that the modeled ligands are 
modulators of at least one biological activity of a PPCA or a pPPCA . As a non-limiting example, a modification (Chong 
et aL, Biochim. Biophys. Acta 1 077:65-1 \ (1991)) of the fluorometric assay of Potier (et al^Analyu Biochem. 94211- 
296 (1979)) is optionally used to measure neuraminidase inhibition or stimulation, optionally including determination 
of inhibition constants (*,). Other suitable PPCA activity assay include, e.g.. cathepsin A activity (Galjart et at., J. Biol. 
Chem. 26^:14754-14762 (1991); Endothelin J deamidase activity (Jackman, et a!., J. BioL Chem. 267:2872-2875(1992); 
and tachykinin deamidase activity (Jackman, et ai t J. Biol Chem. 265: 1 1 265- 1 1272 (1 990)). 

Ligands of a PPCA or pPPCA, based on the crystal structure of this enzyme, are thus also provided by the 
present invention. A PPCA or pPPCA ligand is any molecule, compound or composition that is capable of associating 
with a PPCA or pPPCA and optionally modulating at least one function or structural feature of a PPCA or pPPCA. 
Preferably, a PPCA or pPPCA ligand modulates at least one biological activity of a PPCA or pPPCA. Demonstration 
of clinically useful levels, eg., in vivo activity is also important. In evaluating PPCA or pPPCA inhibitors for biological 
activity in animal models (e.g., rat, mouse, rabbit) using various oral and parenteral routes of administration are 
evaluated. Using this approach, it is expected that modulation of a PPCA or pPPCA occurs in suitable animal models, 
using the ligands discovered by structure determination and x-ray crystallography. 
Evaluation of Tlierapeutic Potentials of Compositions via a PPCA Animal Model 

The present invention also provides methods for identifying diagnostic or therapeutic ligands of PPCA or 
pPPCA via computer RDD, to treat a PPCA-related pathology. Generally, a method for determining the therapeutic or 
diagnostic use of a PPCA or pPPCA modulating ligand, to treat a PPCA related pathology, comprises the steps of 
administering a known dose of at least one ligand containing compositions to an animal model having a phenotype 
corresponding to a PPCA-related pathology, monitoring the appropriate biological or biochemical parameters, and 
comparing the results with treated animals to those of untreated animals. Results indicating the onset or presence of a 
PPCA related pathology are generally referred to herein as "symptoms" of the disease. See., e.g. y U.S. Appl. No. 
08/397,693, filed March 2, 1995, which is entirely incorporated herein by reference. 

Appropriate biological and biochemical parameters that reflect the onset and progression of a PPCA related 
pathology include, but are not limited to, (1) gross biological parameters, e.g., physical appearance (i.e., flattening of 
the face, rough haircoat and/or subcutaneous swelling in affected animals) or growth (reduced weight gain); (2) gross 
behavioral parameters, e.g., lack of coordination; (3) biochemical assays, e.g.. assays of cathepsin A. N-acetyl-e- 
neuraminidase or p-galactosidase activities in primary cultures of skin fibroblasts or tissue homogenates; (4) 
histopathologic^ studies (visceromegaly, i.e., enlarged liver and spleen; accumulation of secondary vacuoles in kidney 
tissues: etc.). 

A first method of evaluating the therapeutic potential of a composition using the transgenic non-human animals 
of the invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
phenotype corresponding to a human PPCA related pathology; 

(2) Detecting the time of onset of symptoms in the first non-human animal: and 
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(3) Comparing the time of onset of symptoms in the first non-human animal to the time of onset 
of symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition; 
wherein a statistically significant delay in the time of onset of symptoms in the first non-human animal relative to the 
time of onset of the symptoms in the second non-human animal indicates the potential of the composition for treatino 
a PPCA related pathology. ° 

A second method of evaluating the therapeutic potential of a composition using the non-human animals of the 
invention comprises the steps of: 

(1) Administering a known dose of the composition to a first non-human animal having a 
phenotype corresponding to a human PPCA related pathology at an initial time, to; 

(2) Determining the extent of symptoms in the first non-human animal at a later time, t,; and 

(3) Comparing, at t„ the extent of symptoms in the first non-human animal to the extent of 
symptoms in a second non-human animal having a phenotype corresponding to a human PPCA related 
pathology, which has not been exposed to the composition at t^ 

wherein a statistically significant decrease in the extent of symptoms at t, in the first non-human animal relative to the 
extent of the symptoms at t, in the second non-human animal indicates the potential of the composition for treating a 
PPCA related pathology. 

In the above methods, the composition being tested may comprise a chemical compound administered by 
c.rculatory injection or oral ingestion. The composition being evaluated may alternatively comprise a polypeptide 
administered by circulatory injection of an isolated or recombinant bacterium or virus that is live or attenuated, wherein 
the polypeptide is present on the surface of the bacterium or virus prior to injection, or a polypeptide administered by 
circulatory injection of an isolated or recombinant bacterium or virus capable of reproduction within a non-human 
animal, and the polypeptide is produced within a non-human animal by genetic expression of a DNA sequence encoding 
the polypeptide. Alternatively, the composition being evaluated may comprise one or more nucleic acids, including a 
gene from the human genome or a processed RNA transcript thereof. Similarly, the composition being evaluated may 
comprise cells removed from a mammal and genetically engineered to overexpress a lysosomal protein or some other 
therapeutic polypeptide. 

Once the PPCA modulating ligand has been shown to be effective in an animal model, it can then be tested in 
human clinical trials, according to known method steps. 

In the above methods, delivery of the composition being tested to non-human animals is achieved via means 
appropriate for the composition being tested, e.g., by diet; by intermittent or continuous intravenous injection of one or 
more of the compositions or of a liposome (Rahman and Schein. in Liposomes as Drug Carriers. Gregoriadis ed John 
Wiley, New York (1988). pages 381-400; Gabizon, A., in Drug Carrier Systems, Vol. 9. Roerdink et at., eds ' John 
Wiley, New York (1989), pages 185-212) or microparticle (Tice et al., U.S. Patent 4.542.025 (Sep. 17, 1985)) 
formulation comprising one or more of the compositions; via subdermal implantation of drug-polymer conjugates 
(Duncan. R.. AntUCancer Drugs JM75-210 (1992); via microparticle bombardment (Sanford et al., U.S. Patent 
4.945.050 (Jul. 31,1 990)); via infusion pumps (Blackshear and Rohde, in Drug Carrier Systems, Vol. 9, Roerdink et 
aL eds.. John Wiley, New York (1989), pages 293-310) or by other appropriate means known in the art (see, generally. 
Remington's Pharmaceutical Sciences. 18th Ed.. Gennaro. ed.. Mack Publishing Co.. Easton. PA (1990)). " 
Pharmaceutical/Diagnostic Administration 

Using compounds or compositions comprising at least one PPCA or PPCA modulating ligand. the present 
invention further provides a method for modulating the activity of a PPCA or pPPCA protein in a cell. In general, 
ligands (antagonists or agonists) which have been identified to inhibit or enhance the activity of at least one PPCA or 
pPPCA ligand can be formulated so that the ligand can be contacted with a cell expressing at least one PPCA or pPPCA 
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protein in vivo. The contacting of such a cell with such a ligand results in the in vivo modulation of at least one 
biological activity of a PPCA or pPPCA. 

At least one PPCA or pPPCA modulating compound or composition of the invention can be administered by 
any means that achieve the intended purpose, using a suitable pharmaceutical composition or formulation. For example 
5 administration can be by various parenteral routes such as subcutaneous, intravenous, intradermal, intramuscular 
intraperitoneal, intranasal, intracranial, transdermal, or buccal routes. Alternatively, or concurrently, administration can 
be by the oral route. Parenteral administration can be by bolus injection or by gradual perfusion over time. 

A typical regimen for treatment or prophylaxis comprises administration of an effective amount over a period 
of one or several days, up to and including between one week and about six months. It is understood that the dosage 
10 of a diagnostic/pharmaceutical compound or composition of the invention administered in vivo or in vitro will be 
dependent upon the age, sex, health, and weight of the recipient, kind of concurrent treatment, if any, frequency of 
treatment, and the nature of the diagnostic/ pharmaceutical effect desired. The ranges of effective doses provided herein 
are not intended to be limiting and represent preferred dose ranges. However, the most preferred dosage will be tailored 
to the mdmdual subject, as is understood and determinable by one skilled in the relevant arts. See, Berkow et al. 
eds.. The Merck Manual, 16th edition, Merck and Co., Rahway, N.J., 1992; Goodman et al., eds., Goodman and 
Oilman's The Pharmacological Basis of Therapeutics, 8th edition, Pergamon Press, Inc., Elmsford, N.Y., (1 990); Avery's 
Drug Treatment: Principles and Practice of Clinical Pharmacology and Therapeutics, 3rd edition, ADIS Press LTD 
Williams and Wilkins. Baltimore. MD. (1987). Ebadi. Pharmacology, Little. Brown and Co.. Boston, (1985); Osol etal. . 
eds.. Remington's Pharmaceutical Sciences. 18th edition. Mack Publishing Co.. Easton, PA (1990); Katzung. Basic and 
Clinical Pharmacology. Appleton and Lange. Norwalk. CT (1992). which references are entirely incorporated herein 
by reference. 

The total dose required for each treatment can be administered by multiple doses or in a single dose The 
Aagnostic/pharmaceutical compound or composition can be administered alone or in conjunction with other diagnostics 
and/or pharmaceuticals directed to the pathology, or directed to other symptoms of the pathology. Effective amounts 
of a diagnosnc/pharmaceutical compound or composition of the invention are from about 0. 1 ug to about 100 mg/kg 
body weight, administered at intends of 4-72 hours, for a period of 2 hours to I year, and/or any range or value therein. 

The recipients of administration of compounds and/or compositions of the invention can be any mammals 
Among mammals, the preferred recipients are mammals of the Orders Primata (including humans, apes and monkeys) 
Artenodactyla (including horses, goals, cows, sheep, pigs), Rodenta (including mice, rats, rabbits, and hamsters) and 
Carnivora (including cats, and dogs). The most preferred recipients are humans. 

Having now generally described the invention, the same will be more readily understood through reference 
to the followmg example which is provided by way of illustration, and is not intended to be limiting of the present 
invention. y 

Example 1: Preparation, Purification and Crystallization of PPCA or pPPCA from Human 

Cells 

The present invention provides, in one aspect, the determination of the three-dimensional structure of the human 
protective pro.ein/cathepsin A (PPCA) in the precursor form (pPPCA) by a combination of molecular replacement and 
twofold density averaging. The structure presented here is the first of an enzyme associated with a human PPCA related 
pathology, and the third human lysosomal enzyme stmcture determined. The structure gives us insieh, into the zvmoeen 
act.vat.on mechanism of pPPCA . as well as the expected 3-D structure of PPCA and its specific and new enzvmatic 
activities. 

PPCA andpPPCA Expression and Purification 

Plasmid Constructs. AcMNPV transfer-plasmids pJR2 and pBC3 (Figure I) were derivatives of plasmid 
pAc,73. carrying the entire polvhedrin gene (Smith e, al.. 1985). In pJR2 a polylinker with a number of multiple 
cloning s „es (MCS) was inserted directly 3' of the polvhedrin promoter, and substituted a 33-nucleotide deletion of the 



25 



40 



15 



20 



WO 97/15588 PCT/US96/17325 

-22- 

polyhedrin gene, starling with the ATG. pBC3 had the polylinker situated in a similar position as pJR2. but instead of 
the 33-nt deletion this plasmid featured an ATG codon mutated in ACG. Full-length human PPCA cDNA. PPCA54 
(Galjart el al. , 1 988). and the two deletion cDNA mutants. 32(*20) and 20(^32) (Galjart el al , 1 99 1 ). were subcloned 
either in pJR2 or pBC3 as EcoRI fragments, using standard procedures (Sambrook et al., 1989). (Figure I). The 
5 20U32) deletion mutant was tagged with the human PPCA signal sequence, as reported earlier (Galjart el al., 1991). 
All cDNA fragments were engineered to have short 3' and 5' untranslated regions (< 10 bp). 

Transfection and Selection of Recombinant Bacuhvirus. Spodoptera frugiperda insect cells (IPLB-SF2 1 ) 
were cultured in monolayers at 27T in TNM-FH medium (Hink, 1970), supplemented with 10% FBS and antibiotics 
(complete medium). Wild-type (wt) AcMNPV virus strain E2 (Smith and Summers, 1978) and recombinant 
1 0 baculoviruses were propagated on confluent monolayers of Sf2l cells. Recombinant constructs AcPPCA54, AcPPCA32 
and ACPPCA20 were generated by cotransfecting Sf2l cells with 1 ug wt-AcMNPV DNA and 10 fig plasmid DNA, 
using the calcium phosphate method, modified for insect cells (Graham ei at , 1973; Carstens et at., 1980; Summers et 
al.. 1987). Recombinant polyhedrin-negative recombinant baculoviruses were then selected and purified by sequential 
plaque assays, and verified by dot blot and southern blot analysis (Summers et at., I 987). Large quantities of inoculum 
were produced by infection of insect cells at 25-50 % confluency, with recombinant virus at a multiplicity of infection 
(MOl) of < 1 pfu/cell. After 3 to 6 days at 27'C, when all cells appeared infected, the medium was harvested and 
centrifuged for 5 m at 1000 rpm to remove detached cells. The litre of the inoculum was determined by plaque assay 
analysis. 

Protein purification and western blotting. Sf21 cells were cultured in either 175 CM J or 500 CM 1 flasks (triple 
flask. Nunc) to near confluency, and infected with recombinant baculoviruses at a MOl of 5- 1 0 pfu/cell. After I 5 h 
incubation at 27 °C. the inoculum was replaced with complete medium for additional 8 to 10 hrs. Cell monolayers were 
then rinsed with PBS and cultured further for 38 h in unsupplemented Grace* medium. After infection the medium was 
collected, centrifuged for 5 m at 1500 g, and for 1 h at 100.000 g (Beckmann SW-28 rotor) to remove virus particles. 
After centrifugation the supernatant was concentrated 20-fold, in an Amicon stirred cell. Glycoproteins were purified 
25 -60% using a concanavalin A-SEPHAROSE affinity chromatography column, as described earlier (Verheijen et al. 
1982). Total protein concentration was measured using the method of Smith et at., (1985). Aliquots of the purified 
preparation were resolved on 12.5% SDS-polyacrylamide gels under reducing and non-reducing conditions. Gels were 
either Coomassie brilliant blue- or silver stained (Sambrook et a!., 1 989). For western blotting, proteins were transferred 
from gels to IMMOBILON PVDV membranes (Millipore Corp.). using a semidry blotter (The W.E.P. company). 

Development and Use ofpPPCA antibodies. A 15 amino acid peptide (NH 2 -Cys-Met-Trp-His-Gln-Ala-Leu- 
Leu-Arg-Ser-Glu-Asp-Lys-Ala-Arg-COOH) (Figure 5). based on the C-terminal sequence of the 34-kDa PPCA subunit 
(amino acid 285-298, Galjart et al., 1988), was synthesized on a peptide synthesizer (Applied Biosystems), and 
covalently linked to the carrier protein Keyhole Limpet Hemocyanin, using the 1MJECT ACTIVATED IMMUNOGEN 
CONJUGATION KIT (Pierce). Polyclonal antibodies against the conjugated product were raised in rabbit, by multiple 
subdermal injections of the protein (40-125 ug) mixed with incomplete Freunds adjuvant (Pierce). Rabbits were bled 
34 days after the first injection. The antibodies, designated anti-pep, were tested on immunoblots and by 
immunoprecipitations of baculovirus produced PPCA. 

Blots were incubated for at least 12 h in blocking buffer (0.01 M tris-buffered saline pH 8.0 (TBS). 0.05% 
Tween 20. and 3% (w/v BSA). and subsequently probed for 2 h with polyclonal PPCA antibodies, anti-54. diluted 1 :200 
in fresh blocking buffer. They were then washed for 1 h in TBS. 0.05% Tween 20. and incubated for 2 h with alkaline 
phosphatase conjugate anti-rabbit IgG (Sigma. 1:1000 in blocking buffer). Proteins were visualized using alkaline 
phosphatase substrate (Sigma, 4-aminodiphenylamine diazonium sulfate, naphtol as-mx phosphate). 

Crystallization of PPCA. Fractions containing the precursor form of the protein as assayed on an SDS-PAGE 
gel were pooled. Subsequently the protein was concentrated to 5 mg/ml and the buffer exchanged to 50 mM NaAc pH 
5.2 or 50 mM MES pH 6.5 using a CENTRICON-I 0. Crystals were grown using the hanging drop vapor diffusion 
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technique. Crystals suitable for data collection were grown using a reservoir solution containing : 2- 1 0 % PEG 8000, 
pH 8.0 - 8.3, 50mM TRIZMA. ImM NaN„ 0.25 % P-octyl glucoside at 4-12"C. Mixing non-equal volumes of protein 
solution (in the range 5-10ul) and reservoir solution ( in the range 2-6 W) enhanced the occurrence of single large 
crystals per drop under these crystallization conditions. The concentration of the protein solution before mixing was 
5 5 mg/ml. Crystal growth was enhanced by macrocrystallization techniques (anything that promotes growth of big 
crystals) and in some cases by micro- and macroseeding techniques. 

Example 2: Structure Determination ofapPPCA Crystallized from Human Cells 
Data Collection, Data Processing and Reduction. 

To allow for data collection at cryotemperatures, the crystals were cryoprotected by adding glycerol in 5% -10% 
10 steps to a solution of about 12% PEG 8000, 50 mM TRIZMA, pH 8.0, ImM NaN„ 0.25% p-octyl glucoside, which 
served as an artificial mother liquor. The crystals were incubated for half an hour at 40*C after each addition of 
glycerol. The final mother liquor contained 30% glycerol. Gradually increasing the glycerol was needed to help keep 
the crystals from cracking. 

Diffraction data was collected at the Stanford Synchrotron Radiation Laboratories (SSRL) to 2.0 A at -178 e C 
15 on a MAR imaging plate at a wavelength of 1 .08 A on beam-line 7-1. The diffraction coordinate data (corresponding 
to atomic coordinates monomer I, the other monomer coordinates are provided by matrix conversion of these 
coordinates, as presented herein) was processed and reduced using MOSFLM version 5.2 from the CCP4 program 
package (SERC (UK) Collaborative Computing Project 4. Daresbury Laboratory UK. 1979). The program REFIX 
(Kabsch (1993), infra) was used for auto-indexing. Using the CCP4 program suite (SERC (UK) Collaborative 
20 Computing Project 4, Daresbury Laboratory UK, 1979), the intensities were scaled (ROTA V ATA), merged 
(AGROVATA) then converted to amplitudes and truncated with the program TRUNCATE. Statistics of the data 
collected are given in Table I. TheV m (Matthews, B.W., J. Mol. Biol. JJ:49l-497 (1968)) is 3.2 A'/Da for 2 monomers 
m the asymmetric unit, corresponding to a solvent content of 62%. 
Molecular Replacement 

25 Search Model: The best molecular replacement results were obtained using a multi-Ala core as a search probe. 

The 'multi-Ala core" search model was constructed from the atomic coordinates of the CPW monomer (Liao et al., 1992), 
based on the sequence alignment as presented in Figure 15. Regions expected to deviate in structure between PPCA and 
CPW were deleted from the model (i.e. with low sequence identity or located in loops). The 125 residues identical in 
PPCA and CPW were left in the model; 1 1 2 residues were truncated to alanine. The remaining 94 residues through 

30 differing between CPW and PPCA, were considered sufficiently similar in size and the CPW residue left as such in the 
model. The resulting 'multi-Ala core' monomer consisted of 33 1 residues, constituting a large portion of the core domain 
and little atomic information for the 'cap' domain (see Figure 1). The model contained 30% of the expected protein 
scattering mass given the fact that there are two monomers in the asymmetric unit. The sequence identity between this 
search model and the true PPCA structure was 37.7%. 

35 Rotation Function, PC Refinement and Translation Function: Native data of 8 - 4A was used in the 

molecular replacement calculations. The rotational searches utilized a real space Patterson search method as 
implemented in X-PLOR (Steigeman. 1974;Huber, 1985. Brunger 1992a) with a Patterson vector cutoff of 2lA. The 
self-rotation function failed to reveal any non-crystallographic two-fold symmetry relating two monomers in the 
asymmetric unit. In addition, the native self Pattersons did not reveal the presence of a non-crystallographic two-fold 

40 ax.s parallel to a crystallographic axis. These results indicated that the two monomers in the asymmetric unit miah, not 
form a dimer together. The cross-rotation function was carried to find the orientation of the two monomersm the 
asymmetric unit as follows. Patterson vector sets were calculated for the search model and the native data and the 8000 
strongest Patterson vectors were used in the rotation function. The rotational space restricted to the asymmetric unit of 
the rotation function according to Rao et al., 1 980. was sampled by rotating the Patterson vectors from the search model 

45 around Eulerian angles 01. 62. and 63. while sampling 62 in angular grid intervals of 2.5». The 5000 highest rotation 
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function grid points were selected resulting from the product function of the two Patterson vector sets. The grid points 
(differing less than 8° around any given axis) were then clustered. The result was a list of 169 possible solutions for 
the rotation function, each corresponding to a set of three angles describing an orientation. The two top solutions were 
3.9 and 3.8 sigma above the mean. PC-refinement (Briinger, 1990) was carried out to optimize each of the 169 possible 
solutions using the complete search model as a single rigid body. This yielded two orientations with a PC-index of 0.043 
and 0.051 respectively. The orientations of these solutions were (D x = 261.4, D, = 36.22, £> 3 = 147.28); and 
18.52, D z = 47.40, D* = 23.22), respectively. In contrast, the rest of the possible solutions yielded an average PC-index 
of0.022. 

Individual translation function calculations were performed on a 1 A grid. A translation^ solution was found 
for each orientation at positions (x=33.30, y-51.97, and z-12.79) and (x=25.23> y=28.58, and z=22.02), with respect 
to the crystallographic center, as 7.7 and 8.8o, respectively, above the mean. The R, w for the individual solutions was 
55.6% and 54.8% in the resolution range 8.0 to 4.0A, with a correlation coefficient (CC) of 0.095 and 0.1 14. A 
combined translation function was calculated to place each solution relative to the same crystallographic origin, resulting 
in an of 52.8% for data between 8.0 and 4.0A, bringing the down to 51.3% and increasing the CC to 0.22. 
The molecular packing was assessed on a graphics workstation, which revealed no clashes between the placed search 
probes. However, a very large amount of empty space was present. The packing showed that the asymmetric unit 
contained two half dimers, each forming a dirner with another monomer in a neighboring unit cell. The two cores in 
the asymmetric unit were related by K=73° around an axis tilted 15.5° off the crystallographic a axis lying in the ax 
plane. 

Iterative Model Building and Two-fold Averaging 

Initial Electron Density Map: A2m|F e J -D|F^ C | SigmaA weighted map (Read, 1986) was calculated using 
IF^ps and phases from the molecular replacement solution. The map was contoured at lo and showed good density 
for most of the core. Density emerged for many side chains where the input model residue had been an Ala, indicating 
that the molecular replacement solution was correct. 

First Model Built: The two rotated and translated search probes formed the starting point for model building 
of the PPCA precursor. The non-crystal lographic symmetry (NCS) matrix was determined between the two cores using 
the "Lsq_explicif 1 option in the computer program O (Jones et aL. 1991 ). Subsequently a 'best monomer" was built by 
superimposing the electron densities from each monomer core, and adjusting the model accordingly. Residues were only 
incorporated in the model where the electron density was visible for the complete side chain. Residues from the search 
model for which no density was visible were removed. An alanine was built in the model at places where electron 
density for a side chain was partial. In this manner 294 residues, i.e. 65% of the C atoms were built in the 'best 
monomer' core. The second monomer was generated from the •best monomer' model using the NCS operator relating 
the two monomers in the asymmetric unit. At this point the data set was partitioned in a working set and a test set 
consisting of 5% of the reflections between 8 - 2.2A to monitor the R, w (Briinger et al. 1992b). The working data set 
was used for rigid body and positional refinement. For averaging and map calculations the unpartitioned data set was 
used. Twenty-five cycles of refinement using the two 'best monomers cores' positioned in the asymmetric unit as rigid 
bodies and data from 8.0 - 3.0A. resulted in an R^ of 53.5% for this resolution range. The atomic coordinates of this 
partial model were used to calculate a new 2m|F obs | - D|F^i SigmaA weighted map which we called the 'best monomer 
map'. 

Averaging: Search for Missing Density: The phasing power from the rigid body refined 'best monomer 
cores', consisting of 294 residues per core was insufficient to bring back interpretable electron density for the missing 
pan of the model, 158 residues per monomer. To overcome this a 'bootstrapping* procedure was applied, entailine 
density averaging using RAVE (Kleywegt & Jones, 1994a) and model expansion. The 'best monomer map' and the rigid 
body refined 'best monomer cores 1 served as the starting point for this procedure. 
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Six bootstrapping cycles were carried out, called bmcl through bmc6, allowing for the model to be extended 
in stepwise increments. Figure 16 shows a scheme of the steps incorporated in one bootstrapping cycle. After a cycle 
in which the model had undergone major expansion, a new molecular mask was calculated with MAMA (Kleywegt & 
Jones. 1994b) for use in the subsequent bootstrapping cycle. No phase recombination was applied between 
bootstrapping cycles. At the end of each cycle the inverted phases o., and inverted amplitudes F, nv S were discarded. 
The NCS operator was re-optimized after cycle bmc3. The resolution range of the data included in the bootstrapping 
cycle started with 15- 3.0 A for bmcl and was gradually extended to 15-2.7 A in bmc6. n,e bootstrapping procedure 
is summarized in Table 2. To optimize the bootstrapping procedure, consideration was given to the molecular mask used 
in the averaging, the model building strategy and the refinement procedure. 

Molecular masks: Four different masks were constructed in total. The atomic radius of all atoms was set to 
4A to calculate each mask. The masks were then manually modified using mask editing options in O (Jones el al 1991). 
Maskl. was constructed around the 'best monomer core'. Subsequently it was greatly enlarged by multiple blocks of 
10- 15 A 3 in the regions where the model was incomplete (Figure 17). This was crucial to prevent the density in the 
insertion area's from being flattened during the averaging step. Approximately one half of the dimer interface was 
15 estimated to be formed by regions from the missing cap domain. Major expansions of the mask in this area were made 
to accommodate for this. This resulted in a serious overlap problem when the mask was duplicated to cover a complete 
dimer. The mask was reduced where overlap occurred with the "overlap_trim" option of MAMA. After several 
bootstrapping cycles, new incorporated polypeptide fragments were carefully assigned to one of the two monomers 
forming the dimer and the mask at the dimer interface area's was manually adjusted accordingly. Essentially the masks 
were kept far too large in regions where the model was missing in order to avoid erroneous flattening of electron density. 
In contrast the masks were tightened around the area's of the molecule where the model was complete. 

Model Building: A conservative model building strategy was adopted. Initially only side chains were mutated 
in the core region to fit the PPCA amino acid sequence and where the density was clear, poly-alanine fragments were 
built in the insertion area s (loops and the cap domain). Newly included atoms were given a B-factor of 20 A 2 . Only 
25 once models bmc5 and bmc6 were obtained, was the electron density of sufficient quality to allow side chains to be 
.ncorporated confidently in the cap domain (residues. 1 90 - 303). At this stage the C trace was virtually complete for 
the whole dimer and the sequence could be fit unambiguously. 

Refinement: Positional refinement was postponed until after 3 cycles of bootstrapping resulting in a final 
model containing 91% of the C atoms. Forty steps of positional refinement were then carried out to improve the 
geometry of the model. Subsequently only one of the refined monomer was taken and the other generated using NCS 
operators. The rational for delaying the positional refinement is addressed in the discussion. 

Completing the model: deviations from two-fold symmetry. It was possible to add 148 residues and 1 85 side 
chains per monomer after a total of 6 bootstrapping cycles. At this stage, each subunit contained 442 residues and 4 1 3 
side chains, i.e. 98% of the C« and 91% of the side chains atoms. The gradual model expansion as a function of the 
35 bootstrapping cycle is shown in Figure 1 8. 

Twenty residues were still missing in the asymmetric unit at this stage. These were localized to two stretches 
per monomer (260 - 262 and 287-292). With most of the scattering mass incorporated, the monomers from model bmc6 
was refined mdividually with X-PLOR (BrUnger, 1992a) in an attempt to retrieve electron densitv for the still missin- 
rescues. After 40 steps of positional refinement using data from 8.0 - 2.6 A. the R,_ dropped significantly from 40.2% 
to 33.2%. The model was further positionally refined using a full weight W, on the crystallographic term The data 
.ncluded in the refinement was gradually extended to 2.2 A. At 2.4 A resolution individual B-factors were refined and 
the distribution checked as a function of atom location (i.e.. low B-factors in the core and high B-factors on the surface) 
Cycles of refinement and refitting allowed for 1 8 missing residues to be added. Essentially almost the complete cap 
doma.n was retrieved using the bootstrapping procedure, as shown in Figure 19 It became apparent from the refined 
maps that the txvo stretches of missing amino acids adopted a very different conformation in the two monomers (with 
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as much as an average r.m.s.d. of 7.9 A for the C»s of residues 287 - 292). For this reason electron density for these 
regions had not been retrieved in the two-fold averaging process. The stepwise improvement of the electron density 
maps along with averaging, model expansion and refinement is shown in Figure 6. 

The program ARP was used to check our model, in particular the region at the dimer interface (Lamzin & 
5 Wilson, 1993). Prior to the final round of positional refinement, an IF^I/o cutoff was applied to reject 10% of the 
weakest data as well as an anisotropic scale factor to offset the decreased resolution along the crystallographic a axis 
The final model is of good geometry with a final of 21.3% (R, m of 26.8 %) for data between 8.0 and 2.2 A (see 
Table 3). A Ramachandran plot is given in Figure 21. The r.m.s. coordinate error is 0.282 as calculated by SigmaA 
(Read, 1986). The average phase difference between the initial molecular replacement model and the currently refined 

10 model is calculated to be 71° for data between 10-2.2 A. 

The structure determination of PPCA is special in that two-fold averaging could be applied to refine very poor 
molecular replacement phases, enabling us to retrieve electron density for 148 residues and 185 side chains per 
monomer. In total 3 1 4 complete residues were added per asymmetric unit, equivalent to about 35 kDa of protein. In 
retrospect we feel that a number of factors contributed to a successful structure determination. 

15 Crystal Packing. Each monomer in the crystal is interacting with four non-crystallographically related 

monomers. By far the most extensive contact is with a non-crystallographically related monomer generating the 
physiological dimer. Three additional contacts are extensive crystal contacts ranging from 200-800 A 2 averaged per 
monomer. The largest nondimer crystal contact involves the precursor loops from two crystallographically independent 
monomers ( region 265-267. 281-295 from monomer I with residues 281-293 from monomer 2) making intimate contact 

20 with each other. Summed together these loops create an intermolecular buried surface of 1680 A 2 . We believe that this 
stabilizes an otherwise very flexible area, possibly explaining the good diffraction qualities of the P2.2.2 crystals. 

It is also in this crystal contact mat we find deviating spacial conformation and secondary structure between 
the two monomers as mentioned before. The electron density in this region is of very good quality with average 
temperature factors of 16.6 A : for main chain and 1 8.3 A 2 for side chains. 

25 pPPCA and the Hydrolase FanUfy. The fold of pPPCA belongs to the large hydrolase fold family containing 

enzymes such as the serine carboxypeptidases, dehalogenase, various lipases and acetylcholine esterase (Ollis et al. 
(1992). infra), having various different catalytic functions. Though the central core is the same (a central (J-sheet flanked 
by a-helices on both sides) the proteins in this family all seem to have different 'cap' domains, both with respect to fold 
as well as size (Figure 7A-F). pPPCA has one of the largest cap domains comprising 121 residues forming the three 

30 helical bundle of the helical subdomain and a three stranded P-sheet of the maturation subdomain. 

Major Differences and Comparison With the Serine Carboxypeptidases. The overall fold of the pPPCA 
monomer is similar to that of the wheat and yeast serine carboxypeptidases (Endrizzi et at. (1994), infra; Ollis et al. 
( 1 992), infra). The complete core domains of pPPCA and CPW superimpose with an r.m.s. deviation of 1.7 A for 302 
Ca atoms and 38% sequence identity. Deleting major deviating loops from the core domain allows for pPPCA to 

35 superimpose with an r.m.s. deviation of 1 .2 A onto CPW and CPY (293 equivalent C's with 40 % sequence identity for 
CPW/pPPCA and 271 equivalent C-s for CPY/pPPCA with 42.2% identity). 

The cap domain in pPPCA differs significantly from the CPW and CPY counterparts. The pPPCA structure 
reveals a large maturation subdomain not present in the structure of CPW and CPY for which the structures of the 
enzymatically active forms are known. All three enzymes contain a 3 helical bundle in the cap domain. The sequence 

40 identity between the three proteins in this region is very low (ca 12 %). In contrast. PPCA shows a much greater 
deviation. Hal superimposes reasonably well with the CPW counterpart maintaining the same general orientation with 
respect to the core domain (requiring a rotation of only 7.4"). But helices Ha2 and Ha3 have undergone major rotations 
with respect to Hal and the core domains by k = 28.5° and x = 93.4°, respectively (Figure 8A). 

Due to the integral role of the cap domain in forming the dimer interface, the dimers of PPCA and CPW were 

45 compared. In the pPPCA and CPW dimers the monomers are oriented differently with respect to each other. 
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Superposition of the core domain of one monomer from each dimer shows that the second pair of monomers (forming 
the respective dimers) differ by a remarkable 15° in orientation (Figure 8B). Thus, it appears that the extensive 
differences in the cap domains lead to a different arrangement of the subunits in the dimers of PPCA and CPW. 

Catalytic Triad and Enzymatic Mechanism. Our structure shows that the precursor PPCA has all the elements 
proposed for the enzymatic machinery of the serine carboxypeptidase family (Liao et al. (1992), infra: Endrizzi et ai 
(1994), infra), and is now discovered to be the third structure elucidated belonging to this family of enzymes after CPW 
and CPY. The catalytic triad in the active site of pPPCA is formed by residues Ser 1 50, His 429 and Asp 372. The 0 T 
of Ser 150 forms a good hydrogen bond with the NM of His 429 with a N to O distance of 2.8 A. The NM of His 429 
is 2.7 A removed from the 0*2 and 3.3 A from the 0*1 of Asp 372. Further, two backbone amides appear to orient the 
carboxylate group of Asp 372. The N of Ala 374 is at a distance of 3.0 A to the O fll of Asp 372 and the N of Cys 375 
is at a distance of 2.9 A to the 0 M of Asp 372. 

The oxyanion hole proposed to stabilize the negatively charged tetrahedral intermediate in serine 
carboxypeptidases is formed by the backbone amides of Gly 57 and Tyr 151 in PPCA. The 32 atoms of the catalytic 
triad residues plus the oxyanion hole amides from PPCA, CPY and CPW superimpose with an nrn.s. deviation of 0.4 
A indicating the very high degree of structural similarity of the active site in the PPCA precursor with those in the fully 
active enzymes CPY and CPW, (see Table 4). The carboxylate of Asp 372 and the imidazole of His 429 in PPCA are 
non-planar, making an angle of approximately 60 D between the imidazole and the carboxylate. A similar non-planarity 
has been observed in CPW and CPY, in contrast to the planar orientation found in subtilisin-.and trypsin-rype serine 
proteases (McPhalen etai. Biochemistry 27:6582-6598 (1988)). 

In pPPCA, a pair of glutamic acid residues (Glu 69 and Glu 149) is positioned near the catalytic triad, with their 
carboxylate groups interacting with each other. The carboxylate groups are located at approximately 8 A from the 0' 
of Ser 150, and lie at the bottom of the active site. An asparagine (Asn 55) is orientated such that it forms a hydrogen 
bond to each of the two carboxylate groups of the glutamic acid pair, at an N M (Asn) to CVO" 2 (Glu) distance of 3.0 and 
3.6 A, respectively. In addition the two carboxylates interact with each other via hydrogen bonds. This configuration 
of two glutamic acid residues and an asparagine, is conserved between pPPCA, CPW and CPY (see Table 4), and has 
been implicated in regulating the low pH optimum for the carboxypeptidase activity found in the serine 
carboxypeptidases (Liao et al. (1992), infra). Biochemical data has suggested that a functional group with an apparent 
pK, value of pH 5.5, functions to bind the C-terminai carboxylate group of peptide substrates and is responsible for the 
observed pH optimum of 5.5 (reviewed in Breddam et ai (1986), infra: Rawlings & Barrett ( 1 994), infra). Together 
with their structural data, Liao and colleagues (Liao et ai ( 1 992), infra) have suggested that at pH 5.5 or below, one or 
both glutamates must be uncharged, while at a pH higher than 5.5 one or both of the carboxylates which are orientated 
opposite to each other may become deprotonated resulting in unfavorable electrostatic interactions. This would disturb 
the hydrogen bonding partem or result in structural perturbations causing the observed increase in K m for peptide 
substrates at high pH. In pPPCA the orientation of this pair of glutamic acids as well as that of the asparagine is 
essentially identical in structure to the equivalent residues in CPW and CPY (see Table 4). even though the structure has 
been determined at pH 8. The CPW and CPY structures have been determined at pH 5.7 and at pH 6.5-7.0. Thus, our 
structure appears to rule out large pH induced conformational changes of these three residues at least up to a pH value 
2.5 units above that optimal for carboxypeptidase activity. However the high degree of conservation of these residues 
does indicate some role in a characteristic shared by all three enzymes. 

From our comparison it is clear that the enzymatic machinery in the PPCA precursor form is in a conformation 
virtually identical to that found in the fully active CPW and CPY enzymes. On this basis, the conformation of the 
enzymatic machinery found in pPPCA is expected to faithfully represent the conformation that will be found in the 
active PPCA. 
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ActiveSite, Substrate Specificity. PPCA has a substrate preference for hydrophobic residues in the PI and/or 
PI' binding pockets (Jackman et al. Hypertension 27:925-928 (1993)). In CPW the PI' pocket was identified to consist 
of two tyrosine residues (Tyr 60 and Tyr 239) which form a long channel, capped by two acidic residues (Glu 272 and 
Glu 398) at the end (Liao et ai (1992). infra). This explains the highest preference of this enzyme for Arg and Lys as 
5 the leaving group (Breddam et al, Carisberg Res. Commun. 52:297-3 1 i (1987)). In CPY a similarly shaped pocket is 
formed by the residues Thr 60, Tyr 256, Leu 272 and Met 398 (Endrizzi et ai (1994), infra). In PPCA the analogous 
residues are Tyr 247 and Asp 64, forming the sides of the pocket with at the far end Met 430 and Thr 304. This is 
reasonably consistent with an overall preference of PPCA for a hydrophobic leaving group. 

Inactivation Mechanism of the Precursor Form. During the maturation step of the PPCA precursor form, at 

10 maximum residues 285-298 forming the 'excision* peptide, are removed by an as yet unidentified protease(s). In vitro, 
the maturation event can be mimicked by digestion with trypsin utilizing probably positions Arg 284. as well as Arg 292 
and/or Arg 298. The residues forming the 'excision' peptide adopt distinctly different conformations in the two 
crystal lographicaily distinct monomers forming the PPCA dimer in our crystal structure. Yet in both monomers this 
polypeptide region extends out from the protein surface and is virtually completely solvent and protease accessible 

15 (Figure 9). Arg 284 and Arg 292 are particularly well exposed. The main chain atoms of Arg 298 are less accessible, 
being sandwiched between the strand Mp2 and a loop N-terminal to helix Cct6 f while a salt bridge with Glu 264 renders 
the side chain atoms of Arg 298 partially solvent inaccessible. 

The active site cleft is blocked by numerous residues from the maturation subdomain in the precursor form of 
PPCA. The catalytic triad is rendered solvent inaccessible by residues Asn 275, He 276 and Phe 277. These residues 

20 are part of the polypeptide Asp 272-Phe 277 which we call the 'blocking' peptide. This peptide is held down 
predominantly by hydrophobic contacts of Leu 273, He 276, and Phe 277 to the core domain residues Gly 57, Cys 60, 
Leu 180, Leu 190, Val 191, Leu 232, Val 235, lie 246, Leu 280, Leu 282, Met 299 and Ala 373 (Fig 10). In addition 
residue Asn 275 of the blocking peptide appears to fill what might be part of the Pi binding pocket in the mature form. 
Further inspection of the blocking peptide suggests that Gly 274 with Ramachandran angles <J> = 66° and <fr = 28 \ might 

25 play a central role in the strand blocking the active site. A glycine at this position appears critical to allow the 
polypeptide chain to adopt a conformation with its main chain at a safe distance from the catalytic triad. This might aid 
in allowing the blocking peptide to assume a conformation resistant to autocatalysis. The PI ' binding pocket seems to 
be beautifully filled by Pro 301 interacting with Thr 304, Tyr 247, Cys 60 and Cys 334. Thus substrate binding is not 
possible in the precursor form due to the inaccessibility of the substrate binding pockets. 

30 We conclude that the inactivation mechanism of PPCA is based on blocking of the active site, and not upon 

changes in the position of functional groups involved in catalysis/transition state stabilization. Both the PI, P2 and PI' 
binding pockets are rendered solvent inaccessible. The function of the blocking peptide seems to be to render the 
catalytic triad as well as the region around the PI and P2 binding pockets solvent inaccessible. The blocking peptide, 
however, does not assume a conformation that a peptide substrate would adopt. It is carefully positioned in a manner 

35 which is different from that of a productive substrate, thereby avoiding being by the nearby catalytic residues which 
are correctly poised for catalysis. A crucial observation is that the excision peptide itself does not bind in the active site 
cleft. Hence, mere removal of the excision peptide alone is not sufficient to allow solvent or substrate access to the 
active site. 

Proposed Maturation Event and Extent of Conformational Rearrangement. The active site of the precursor 
40 of PPCA appears to be fully blocked by 49 residues of the maturation subdomain, as shown in Figure 1 1 . Based on the 
precursor structure and the comparison with CPW and CPY it is proposed that a region comprising approximately 
residues 254-284 rearranges to free the PI. P2 binding sites, while the residues 299-302 rearrange to free the Pi' binding 
pocket. The linker connecting these two segments of polypeptide chain is the 14 amino acid excision peptide Mel 285- 
Arg 298. The extent of the residues rearranging is likely to be limited by a disulfide bridge Cys 253 and Cys 303. which 
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10 



20 



is conserved in the serine carboxypeptidase family. This critical disulfide serves to keep the secondary structure 
elements together at the far end of the PI' pocket. 

An interesting pair of salt bridges is observed between Arg 262, Asp 300, Glu 264 and Arg 298, four residues 
located on strands M(il and Mp3 of the mixed P-sheet found in the maturation subdomain. This cluster of residues is 
strategically positioned at the base of the excision peptide, close the core domain and 'shielding' the mixed P-sheet via 
side chain interactions (see Figure 1 1). These residues are strictly conserved among the human, mouse and chicken 
PPCAs (Galjart et al. (1991), infra). This charge cluster may be effected by a shift from neutral to acidic pH. Arrival 
in the endosome/lysosome is expected to result in protonation of either the Asp or the Glu residue or both, resulting in 
unfavorable electrostatic interactions and destabilization of this charge cluster. This in mm is expected to promote partial 
unfolding of maturation subdomain, allowing easier access to additional potential cleavage sites, and stimulating removal 
of the 'blocking' peptide which fills the active site in the precursor. 

A similar double salt bridge has been observed in the aspartic proteinase zymogen pepsinogen between the 
proenzyme segment (Arg 8P) and the enzyme (Arg 308, Glu 13, Asp 304). 

The maturation mechanism for pPPCA appears lo be novel among proteases for which the three-dimensional 
15 structure of the zymogen is known. The catalytic triad in the precursor form is in a catalytically competent 
conformation. Enzymatic activity is prevented by a 'blocking' peptide. The blocking peptide is however different from 
the excision peptide and does not get excised from the mature enzyme. This leads to the distinct difference with the 
other known maturation mechanisms in that, after disappearance of the excision peptide, up to 35 residues filling the 
active site cleft in the PPCA precursor must rearrange to render the catalytic triad solvent accessible (see Figure 12), but 
do not get cleaved off. Removal of the excision peptide, and possibly a shift to lower P H in the endosome/lysosome, 
appears to be a trigger for this event. The mechanism does not appear to be autocatalytic. as uptake experiments with 
cultured galaciosialidosis fibroblasts, have shown that a mutant PPCA with the catalytic Ser 150 mutated to Ala, is 
properly targeted and processed. It retains its protective function and except for the loss of catalytic activity is 
biochemically indistinguishable from the wild type enzyme (Galjart et al. (1 991 ), infra). Surprisingly, the maturation 
25 mechanism of the serine carboxypeptidases PPCA, CPW and CPY may all differ from each other as well. This is 
clearest for CPY, in which a 91 residue polypeptide is cleaved off N-terminally to convert the zymogen to an active 
enzyme (Whither and Sorensen. />roc. Natl. Acad. Sci. USA M:9330-9334 (1991)), as opposed to the excision of a 
peptide from within the zymogen generating a two chain active form as is the case for PPCA and CPW. 

Looking at the hydrolase fold family, the catalytic triad is housed in the core domain and the various cap 
domains attenuate the biological function by influencing entirely different properties such as: (I) enzyme kinetics 
exemplified by the interfacial activation of lipases (Smith aal. Curr. Opinion in Structural Biology ^490-496 (1992)); 
(ii) substrate channeling as is proposed for acetylcholine esterase (Sussman et al. (1991). infra); (iii) substrate 
recognition, proposed for dehalogenase by (Franken et al. (1991). infra) and for CPY and CPW by (Endrizzi et al 
(1994), infra), and (iv) enzyme inactivation in the case of PPCA. 

Biological Implications. Deficiency of the protective protein/cathepsin A (PPCA) in humans results in the 
lysosomal storage disease galactosialidosis. PPCA is thought to form a multi-enzyme complex with p-galactosidase and 
neuraminidase in the lysosomes protecting the latter glycosidases in their harsh acidic and proteases-rich environment. 
PPCA has a 30% sequence identity to the wheat serine carboxypeptidase (CPW) and yeast serine carboxypeptidase 
(CPY). It has been show that PPCA in the precursor form is inactive, but upon maturation, entailing excision of a 2 kDa 
40 peptide, carboxypeptidase activity is released. 

The precursor structure reveals an inactivation mechanism that has not been seen before in any of the other 
known zymogen structures of proteases (available for the serine-, metallo- and aspartic protease classes). The catalytic 
tnad seems to have an arrangement poised for catalysis. However, the triad is rendered solvent and substrate 
inaccessible by a strand from the maturation subdomain binding in the active site cleft. Surprisingly, this strand called 
the 'blocking' peptide does not overlap with the 2 kDa excision' peptide. Hence, after removal of the excision peptide 
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up lo 35 additional residues must rearrange in order to unblock the active site cleft. A strategically positioned pair of 
salt bridges, comprising Arg 262, Arg 298. Glu 264. and Asp 300 at the base of the excision peptide, are expected to 
optionally become destabilized at low pH, unraveling this region of the structure, allowing easier access to cleavage sites 
and/or promoting the rearrangement event. 

A number of research groups are currently involved in designing enzyme and gene therapy procedures for 
several lysosomal storage diseases. Insight into the three-dimensional structure, protein functioning and stability of 
PPCA, the first enzyme of known structure associated with a lysosomal storage disease and the third human lysosomal 
structure to be determined, may prove useful in future designs of an adequate therapy procedure for galactosialidosis. 
Information from the three-dimensional structure of PPCA, might also aid in designing an engineered form of PPCA 
with increased stability and a longer half-life. 
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Table 1: X-ray Data Collection Statistics 
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5 



15 



resolution 


51.1 1-2.2 A 


wavelength 


J .UO M 


space group 


P2.2.2 


unit cell 


a=115.04b =148.11 c=80.97A 


temperature of data collection 


-178°C 


No. of observed reflections 


436,709 


No. of unique reflections 


67,740 


completeness of all data 


95.7% 


for all data 


5.1% 


completeness of outer shell 


87.0% 


(2.26-2.20A) 


13.0% 


R,^ in outer shell (2.26-2.20A) 




R »yn.-ZII I (h)-<I(h)>/j;£ I,(h), where I,(h) is the i* observation for reflection h 
and <I(h)> is the weighted mean of all the observations. 
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Table 2: Course of Model Building 



PCT/US96/I7325 







nr. of 


nr. of side 




Rfactor 




CC 


cc fw 




Model 


C* s 


chains 


no 4 A 3 > 


{statistics using data between 8.0 and 3.0A} 


5 


mol. real. (mr\ 
rigid body ref. (rmr) 
calculate NCS matrix 


331 


125 




54.2 
52.6 


52.9 


0.287 


A *\ A A 

0.244 
0.318 




best monomer (hm\ 
rigid body ref. 
update NCS matrix 


294 


228 




55.9 
53.5 


57.4 
55.0 


0.228 
0.320 


0216 
0328 




bmcl (mask 1 ) 


373 


258 


10.8 


49.9 


51.3 


0.403 


0.424 


10 


bmc2 (mask 1) 


405 


277 


10.8 


48.6 


48.4 


0.443 


0.478 




bmc3 (mask 2) 
rigid body ref. 
positional ref. (pbmc3) 
update NCS matrix 


411 


307 


9.99 


47.1 
46.9 
39.4 


48.6 
48.4 
44.7 


0.471 
0.476 
0.622 


0.491 
0.492 
0.562 


15 


bmc4 (mask 1 ) 


412 


327 


10.8 


41.7 


43.1 


0.584 


0J85 




bmc5 (mask 3) 


435 


387 


8.88 


39.8 


40.6 


0.621 


0.623 




bmc6 (mask 4) 


442 


413 


9.11 


38.4 


40.2 


0.647 


0.637 


20 
25 


Summary of the bootstrapping procedure. The resulting models have been listed chronologically starting 
with the molecular replacement solution, i.e. mr (molecular replacement), bm (best monomer core), and 
the bootstrapping cycles bmcl through bmc6. The following statistics are given for the various models: 
the number of C - atoms built per monomer; the number of correct side chains incorporated per monomer 
and the volume of the molecular mask used during the averaging if applicable. The quality of each model 
is assessed using the R^, R^ CC and CC ftm calculated by X-PLOR for data between 8.0 and 3.0 A. 
After positional refinement of model bmc3. both monomers were made equivalent by taking one monomer 
and generating the non-crystal lographically related one. 
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Table 3: Current Status of the Model 



Statistics for the dar^ y^d in refinement- 






resolution (A) 


Rfactor(%) 


completeness 


8.0 - 4.3 


22.4 


85.7 


4.3 - 3.5 


19 0 


89.1 


3.5-3.0 


20.6 


89.1 


3.0-2.8 




87.9 


2.8-2.6 


22.3 


86.1 




22.2 


84.0 


2.4-2.3 


22.7 


81.3 


2.3-2.2 


24.0 


78.3 


8.0-2.2. A 


21.3% 




model: 






molecules in the asymmetric unit: 




2 


residues (out of 904 possible): 




902 


sugars: 




6 


waters: 










r.m.s.d. bond length (A): 




0.012 


r.m.s.d. bond angles (*): 




1.72 


average B-values for main chain atoms (A 3 ): 




16.6 


side chain atoms (A 2 ): 




18.3 
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Table 4 



Superpositi n fthepr p sed catalytic machinery ft he serine car boxy peptidases with known three- 
dimensional structure PPCA, CPW (Liao et aL, Biochemistry 37:9796-9812 (1992)) and CPY (Endrizzi 
et al f Biochemistry 33: 11 106-1 1120 (1994)). 



10 



15 



20 



25 



30 



PPCA 




CPW 










aPPCA-CPW 


Catalytic triad: 


N 


Scr 146 


N 


(A) 


Scr 146 


N 




Scr 150 


c- 






0.3 




c* 


n a 




C 


His 397 


c 


0.4 


His 397 


c 


U.J 


His 429 


0 




0 


0.3 




o 


0 d 




a 




c» 


0.3 




c* 


ft d 
U.*l 






Asp 338 




0.9 


Asd 338 


O v 


1 1 

1.1 


Asp 372 


N 




N 


1.5 


N 


ft 0 




C* 




c 


0.2 




c* 


ft J 




c 




c 


0.3 




C 


0.4 




0 




0 


0.3 




o 


U.J 




c» 




c» 


0.5 




c* 


0.6 




0* 




o* 


0.3 






0.6 




CM 

c. 




c* 1 
c«. 


0.3 
0.7 




c« 

C*' 


0.5 
0.5 








N* 1 


0.4 




N" 


0.5 
0.4 








N.j 


0.3 




N° 




N 




N 


0.7 




N 


0.5 




c- 




C 


0.2 




C 


0.2 




c 




C 


0.1 




c 


0.1 




o 




o 


0.1 




0 


0.1 




a 




c» 


0.2 




c* 


0.1 




a 




e 


0.3 




c» 


0.2 








o* 1 


0.2 




O" 


0.1 




o« 




o« 


0.2 
0.4 




O" 


0.3 
0.1 



Proposed oxvanion hole i formed bv two backbone amides* 



Gly 57 
Tyrl51 



N 

c- 

C 

o 

N 

c 

c 
o 



Gty53 
Tyr 147 



N 

c- 

c 
o 

N 

c* 

c 
o 



at 

0.2 

o.i 

0.3 
0.3 
0.2 
0.3 
0.5 



Gly 53 
Tyr 147 



N 
C* 

C 

o 

N 
C- 

C 

o 



0.5 
0.4 
0.4 
0.8 
0.2 
0.1 
0.2 
0.2 



proposed regulation of nH dependent peptidase activity: 



Asn 5 5 averaged over all atom s Asn 5 1 

Glu 69 averaged over all atoms Glu 65 

Glul49 averaged over all atoms Glu 145 



0.2 
0.3 
0.4 



Asn 5 1 
Glu 65 
Glu 145 



0.2 
0.7 
0.4 



The residues forming the proposed catalytic machinery arc strictly conserved between the three serine 
carboxypeptidases. The deviation in distance between the atoms from PPCA and the equivalent atoms in 
CPW or CPY after superposition is given in Angstrom. 
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What Is Claimed Is: 

1. A method for crystallizing a human protective protein/cathepsin A (PPCA) or 
precursor human protective/cathepsin A protein (pPPCA). comprising 
(a) providing a purified PPCA or pPPCA; 
5 (b) crystallizing the purified PPCA or pPPCA using a hanging drop or diffusion 

method, to provide crystallized PPCA or pPPCA having biological activity, 

wherein the crystallized PPCA or pPPCA is resolvable using x-ray crystallography to obtain 
x-ray diffraction patterns suitable for three-dimensional structure determination of the PPCA or 
pPPCA. 

10 2 - A method according to claim 1, wherein said PPCA or pPPCA has at least one 

biological activity selected from the group consisting of enzyme protecting activity, enzyme 
modulating activity and peptide hydrolyzing activity. 

3. A method according to claim 1, wherein said crystallization step is done under 
conditions of purified PPCA or pPPCA; 2-30% PEG400- 10,000; precipitating salt; buffers and pH 

15 7-9. 

4. A method according to claim 3, wherein the crystallization conditions are PPCA or 
pPPCA; 5-14% PEG8000, 40-80 mM tromethamine, 0.05-2.0 mM NaN, and pH 8.0-8.3. 

5. A crystallized PPCA or pPPCA, or at least one subdomain thereof, provided by a 
method according to claim 1 . 

20 6 - A method for providing an atomic model of a PPCA or pPPCA, comprising 

(a) providing a computer readable medium having stored thereon atomic 
coordinate/x-ray diffraction data of said PPCA or pPPCA in crystalline form, said data sufficient to 
model the three-dimensional structure of said PPCA, said pPPCA, or at least one subdomain thereof; 

(b) analyzing, on a computer using at least one subroutine executed in said computer, 
the atomic coordinate/x-ray diffraction data from (a) to provide data output defining an atomic model 
of said PPCA or said pPPCA. said analyzing utilizing at least one computing algorithm selected from 
the group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 
merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 

30 visualization, model building, rigid body refinement, positional refinement; and 

(c) obtaining atomic model output data defining the three-dimensional structure of 
said PPCA, pPPCA or at least one subdomain thereof. 

7. A method according to claim 6. wherein said computer readable medium further has 
stored thereon data corresponding to a nucleic acid sequence or an amino acid sequence data 
comprising at least one structural domain or a functional domain of a PPCA or pPPCA 
corresponding to a portion of the amino acid sequences of Figures 13 or 14, and wherein said 
analyzing step further comprises analyzing said sequence data. 
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8. A computer readable medium having stored thereon atomic model data of said PPC A 
or pPPCA as the model output data produced by a method according to claim 6. 

9. A computer-based system for providing atomic model data of the three dimensional 
structure of a PPCA or a pPPCA, comprising the following elements; 

5 (a) a computer readable medium having stored thereon atomic coordinate/x-ray 

diffraction data of said PPCA or pPPCA or at least one subdomain thereof; 

(b) at least one computing subroutine, that when executed in a computer, causes the 
computer to analyze the atomic coordinate/x-ray diffraction data from (a) to provide data output 
defining an atomic model of said PPCA or pPPCA, said analyzing utilizing at least one computing 
10 subroutine selected from the group consisting of data processing and reduction, auto-indexing, 
intensity scaling, intensity merging, amplitude conversion, truncation, molecular replacement, 
molecular alignment, molecular refinement, electron density map calculation, electron density 
modification, electron map visualization, model building, rigid body refinement, positional 
refinement; and 

15 ( c ) retrieval means for obtaining atomic model output data defining the three- 

dimensional structure of said PPCA, pPPCA or at least one subdomain thereof. 

10. A computer-based system according to claim 9, wherein said computer readable 
medium further has stored thereon data corresponding to a nucleic acid sequence or an amino acid 
sequence data comprising at least one structural domain or a functional domain of a PPCA or 

20 pPPCA corresponding to a portion of the amino acid sequences of Figures 1 3 or 1 4, and wherein said 
at least one subroutine further includes analyzing said sequence data. 

11. A computer readable medium, having stored thereon atomic model data of a PPCA, 
pPPCA, or at least one subdomain thereof, produced by a computer system according to claim 9. 

12. A method for providing an computer atomic model of a ligand of a PPCA or pPPCA, 
25 comprising 

(a) providing a computer readable medium according to claim 1 1, having stored 
thereon atomic model data of a PPCA, a pPPCA or at least one subdomain thereof; 

(b) providing a computer readable medium having stored thereon atomic model data 
sufficient to generate atomic models of potential Iigands of PPCA or pPPCA; 

30 (c) analyzing on a computer, using at least one subroutine executed in said computer, 

the atomic model data from (a) and the ligand data from (b), to determine binding sites of PPCA or 
pPPCA and to provide data output defining an atomic model of a ligand of said PPCA, pPPCA, or 
at least one subdomain thereof, said analyzing utilizing computing subroutines selected from the 
group consisting of data processing and reduction, auto-indexing, intensity scaling, intensity 

35 merging, amplitude conversion, truncation, molecular replacement, molecular alignment, molecular 
refinement, electron density map calculation, electron density modification, electron map 
visualization, model building, rigid body refinement, positional refinement; and 
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(d) obtaining atomic model output data defining the three-dimensional structure of 
a ligand of said PPCA, pPPCA or at least one subdomain thereof. 

13. A computer readable medium having stored thereon the model output data produced 
by a method according to claim 12. 

5 1 4. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of the ligand model produced by a method according to claim 12. 

15. A computer-based system for providing an atomic model of a ligand of a PPCA or 
pPPCA, comprising the following elements; 

(a) a computer readable medium having stored thereon atomic model data of a PPCA 

10 or pPPCA; 

(b) a computer readable medium having stored thereon atomic model data sufficient 
to generate atomic models of potential ligands of PPCA or pPPCA; 

(c) at least one computing subroutine for analyzing on a computer the atomic model 
data of PPCA or pPPCA from (a) and the ligand data from (b), to determine binding sites of PPCA 
or pPPCA and to provide data output defining a atomic models of potential ligands of PPCA or 
pPPCA, said analyzing utilizing at least one computing subroutine selected from the group consisting 
of data processing and reduction, auto-indexing, intensity scaling, intensity merging, amplitude 
conversion, truncation, molecular replacement, molecular alignment, molecular refinement, electron 
density map calculation, electron density modification, electron map visualization, model building, 

20 rigid body refinement, positional refinement; and 

(d) retrieval means for obtaining atomic model output data defining the atomic 
models of potential ligands of PPCA or pPPCA. 

1 6. A computer readable medium, comprising atomic model output data of a potential 
ligand of PPCA or pPPCA, said data produced by a method according to claim 15. 
25 17. An isolated PPCA or pPPCA ligand, corresponding to the physical molecule of the 

atomic model of a ligand produced by a computer system according to claim 15. 

1 8. A crystallized pPPCA, having the atomic coordinates presented in Figure 23. 1 -23.4 1 
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1.538 36.091 -15.788 0 . 00 0.00 6 
0.494 35.250 -15.076 0.00 0.00 6 
1.112 33.894 -15.107 0.00 0.00 6 
1.508 37.523 -15.301 0.00 0.00 6 
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28.348 36.037 28.244 1.00 14.86 

27.987 37.090 29.198 1.00 10.38 

28.482 36.762 30.624 1.00 9.13 

27.974 37.632 31.800 1.00 5.95 

28.104 36.843 33.076 1.00 9^17 

28.720 38.961 31.951 1.00 5.24 
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29.844 38.547 28.722 1.00 9.76 

30.532 39.789 28.423 1.00 13.61 

32.006 39.671 28.B46 1.00 7*85 
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32.314 39.574 30.347 l . 00 8.30 
33.807 39.475 30.541 1.00 6.09 
31.760 40.789 31.082 1.00 5.02 
30.480 40.352 27.022 1.00 15.33 
30.828 41.515 26.824 1.00 13.42 
30.153 39.523 26.037 1.00 16.95 
30.132 40.000 24.651 1.00 12.20 
31.269 39.397 23.791 1.00 11.32 
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